Classification of data with a qudit, a geometric approach

We propose a model for data classification using isolated quantum $d$-level systems or else qudits. The procedure consists of an encoding phase where classical data are mapped on the surface of the qudit's Bloch hyper-sphere via rotation encoding, followed by a rotation of the sphere and a projective measurement. The rotation is adjustable in order to control the operator to be measured, while additional weights are introduced in the encoding phase adjusting the mapping on the Bloch's hyper-surface. During the training phase, a cost function based on the average expectation value of the observable is minimized using gradient descent thereby adjusting the weights. Using examples and performing a numerical estimation of lossless memory dimension, we demonstrate that this geometrically inspired qudit model for classification is able to solve nonlinear classification problems using a small number of parameters only and without requiring entangling operations.

With low-depth quantum circuits coming to pass, the interest for devising applications for these physical units has much increased.One fast-developing direction that already forms a sub-discipline of Quantum Machine Learning [1,2] is devising methods for addressing problems of classical machine leaning (ML) with variational quantum circuits (VQCs) [3].These types of quantum circuits have adjustable angles in gates which can be trained in a fashion analogous to neural networks [4][5][6][7].On formal level though the mathematical analogy of VQCs with neural networks is far from straightforward, mainly due to the reversibility of VQCs, and the problem of quantum neuron is usually approached with more intrigued models [8][9][10][11][12][13].In addition to neural networks, VQCs show similarities with classical kernel machines [7,14,15] by generating a feature map of classical data in the Hilbert space.In general, the interpretation and most profitable use of VQCs in ML tasks remains an open topic of discussion, including the accurate evaluation of their capacity and their potential or advantages compared to classical models.This work aims to contribute to the question whether quantum circuits are suitable for solving ML tasks and how increasing the dimension of the Hilbert space can be exploited for this purpose.There are two paths to follow: One is to employ n entangled qubits achieving an exponential increase of space, the other, less investigated path, is to employ qudits.For a single qudit, the dimension of the Hilbert space is increasing linearly with d and without requiring entangling operations which remain demanding on practical level.Our quantum toy model consists of a single qudit operated by a low-depth quantum circuit (which we call single layer).With these limited resources, we are able to show that with a proper encoding and adjustment of d with respect to the dimension of the input one may achieve double lossless memory (LM) dimension [16] as compared to habitual singlelayer neural networks (NN) possessing the same number of trainable parameters.This effect cannot be achieved in the absence of parameters in the encoding phase controlling the feature map on the Bloch hyper-sphere.
Going one step further, while keeping the input dimension fixed, the capacity of the quantum system (in the sense of LM dimension) can be further increased by either re-uploading the data [6,17], this way introducing more depth into the quantum circuit, or, alternatively, as we propose in this work, to use higher-dimensional quantum systems by increasing d.We get preliminary evidence that the two methods give comparable results and therefore the selection should be done in dependence of the available resources.
The structure of the manuscript is as follows.We start by introducing the Bloch hyper-sphere representation of a qudit, and then, based on this, we develop a general scheme for mapping the data on its surface and rotating them.We then evaluate different encoding-rotation models according to LM dimension and we draw conclusions on optimal methodology.We illustrate the efficiency of qubit and qutrit models by applying them to standard classification problems including both synthetic and realworld data.

I. THE BLOCH HYPER-SPHERE OF A QUDIT
A qudit stands for the state of a d-level quantum system just as a qubit describes a quantum 2-level system.A qudit state 'lives' in the d-dimensional Hilbert space which is spanned by the eigenstates of the Hamiltonian of the system.Let us denote by {|k } We claim a full su(d) algebra for the system, spanned by d 2 −1 generators {ĝ i } that can be chosen to be orthogonal with respect to the Hilbert-Schmidt product such that and T r ĝ † i ĝj = Gδ i,j with G a positive constant.For d = 2 these generators can be identified with the Pauli operators (G = 2) while for k = 3 and G = 2 with the Gell-Mann operators (see Appendix).Extending the set {ĝ i } by an element g 0 = G d 1, the generators of the algebra form a basis in Hilbert-Schmidt space of Hermitian operators so that any observable Ĥ of the qudit can be written as with . ., n d 2 } a normalized real vector and φ an angle.
The density operator ρ of a pure state |ψ , ρ = |ψ ψ|, being a positively defined Hermitian matrix with T r(ρ) = 1, can also be decomposed on the basis of the generators as with r m = T r ĝ † m ρ /G and r = {r 1 , r 2 , . . ., r d 2 −1 } proportional by 1/G factor to the unit-length Bloch vector, living on the (d 2 − 2)-dimensional surface of the so-called Bloch hyper-sphere.For completeness we note here that pure states occupy only a sub-manifold of this surface of dimension d 2 − d while the rest of the surface corresponds to non-positive Hermitian matrices.Mixed states correspond to vectors inside the Bloch hyper-sphere.Furthermore, since any unitary operation Û is generated by a Hermitian matrix Ĥ as Û = e i Ĥ , in view of the decomposition (2), one can re-write Û = e iφ n. ĝ up to a phase factor.The latter expression leads (with some extra work) to the interpretation of a unitary operation acting on a pure state Û |ψ (or Û ρ Û † ) as a rotation of Bloch vector around the n-axis for an angle proportional to φ.One can also see that the most general unitary operation U n (φ) is parameterized by d 2 −1 real parameters.
Measurable quantities on a qudit are described by Hermitian operators which, again in view of the decomposition presented in Eq.( 2), define a direction on the Bloch hyper-sphere.In addition, the d eigenvectors of the observables, corresponding to d real different measurement outcomes, i.e. eigenvalues, are mutually orthogonal to each other and offer a separation of the Bloch hypersurface into d adjacent segments of equal area, in absence of degeneracies.

II. EMPLOYING QUDITS FOR SUPERVISED CLASSIFICATION TASKS
Let us consider classical data consisting of n kdimensional feature vectors { x}, i.e., x = {x 1 , x 2 , . . .x k }.Every data point belongs to one od M classes.A random subset of the data composed of l-elements (l < n), { x} l , is picked as the training set.

A. Quantum resources
For this problem, the required resource is a single qudit where d 2 − 1 ≥ k and d 2 − 1 − k increasing with the complexity of the task.One should be able to perform the full SU (d) group of operations on the qudit and in addition to measure a single observable Ô.For simplicity, we assume the spectrum to be non-degenerate, yielding d distinct measurement outcomes.Since the classification is based on mean values of measurement outcomes, one should be able to perform experiments in identical conditions multiple times.
In the first part, there is the encoding phase where the classical data, i.e., the elements of the vector x i , together with the adjustable weights s, are "uploaded" on the qudit that is initially in its ground state: (4) where A j implies different grouping of the generators with A j ∩ A k = 0 being a suggestive condition.With |0 we denote the ground state of the qudit.Overall, the angles and axis of rotation of the initial vector |0 are related to both classical data and adjustable weights s in an intrigued way, and the result of such encoding is a map from the Cartesian space, where the inputs are initially described (k-dimensional real vector x), onto the surface of the d 2 − 1 dimensional hyper-Bloch sphere.Given the requirement k ≤ d 2 − 1, we actually map the data onto a higher dimensional feature space characterized by the kernel In the Appendix, we provide an explicit expression of the simplest kernel employed in this work, namely the qubit model A (see Table I).Contrary to the usual rotation encoding consisting of successive rotations around orthogonal directions of the Bloch sphere, resulting in cosine kernels, the 'combined' encoding of Eq. ( 4) results in more intrigued kernels.Naturally, the complexity of these kernels is increasing with d and k.

C. Rotating and measuring
After mapping the data onto the hyper-sphere, it is separated into M groups.A projective measurement of Ô observable, provides, with some probability which depends on the state Eq.( 4), an outcome from the d values of its spectrum {o 1 , . . ., o d } (arranged in increasing order).
We take the habitual assumption that the whole procedure can be repeated many times in an identical way and use the mean value of Ô that lies in the interval [o 1 , o d ] to divide the interval (equally or unequally) into M segments, classifying the data, i.e., To get optimum results though one should be able to rotate Ô in order to 'match' its orientation with the one of the mapped data on the hyper-surface.Alternatively, one can keep Ô intact and rotate |ψ x l , s .So, in this stage, one applies arbitrary rotations to the state vector carrying the classical information, yielding and measures Ô.Let us note that it is not always profitable in terms of capacity to keep all the weights w j in Eq. 6 and some should be ignored or set zero so that W ≈ S.
The whole 'encode-rotate-measure' scheme is repeated many times in identical conditions until mean value for the measurement is obtained that classifies the data point x l according to the choice of segmentation The values y i can be also adjustable in the same way the threshold values of perceptrons in neural networks can be variable and optimizable.
One may summarize the total scheme in the following diagram: Finally, while the full scheme could be written as it is important to note that Ĥ is highly non-linear in the input x due to BCH formula.Our scheme, in contrast, relies on a simple linear encoding of Eq.( 4).

D. Training
To perform the training, we define a loss function that penalizes misclassified data of the training set while correctly classified data do not contribute to its value.Here, T is the set of misclassified data of the training set, and Y i is the upper or lower value of the spectral segment that characterizes correct class for the ith point.In Section V C we use for convenience the cross entropy loss function.
The optimization of parameters implies a minimization of E, which is achieved (in all analysis apart the examples in Section V C) by gradient descent.The landscape of E though contains a number of local minima, and, when starting from a random initial point in the space of parameters s∨ w, the procedure might get trapped in one of those.To improve minimization, we use a sample of l = 50 initial points and we pick the best result among all runs of gradient descent.When dealing with real-worlddata using a qutrit (in Section V C) and comparing its outcome the one obtained with classical models, a more advanced stochastic gradient descent is applied.

III. LOSSLESS MEMORY DIMENSION OF DIFFERENT ENCODING-ROTATION MODELS
In this section, we compare different models of encoding using a measure of capacity with clear theoretical meaning that is also suitable for numerical evaluation.Our aim is not to accurately compare with the capacity of classical neural networks [18], but to identify optimum way for introducing the trainable parameters in encoding and rotating stages, Eqs.4 and 6, of the proposed scheme.Due to limited computational capacity our numerical tests are not 'exhaustive' but indicative.
We are employing a recently suggested measure [16], which has been constructed for evaluating the informational/memory capacity of multi-layered classical neural networks, the so called LM dimension.This is a generalization of the Vapnik-Chervonenkis (VC) dimension [19] that is based on the work of MacKey [20], embedding the memory capacity into the Shannon communication model.The definition of LM dimension [16] is the following: • The LM dimension D LM is the maximum integer number D LM such that for any dataset with cardinality n ≤ D LM and points in random position, all possible labelings of this data set can be represented with a function in the hypothesis space.
• A set of points {x n } in K-dimensional space is in random position, if and only if from any subset of size < n it is not possible to infer anything about the positions of the remaining points.
For this measure, the authors showed analytically that the upper limit of LM dimension scales linearly with the number of parameters in a classical neural network with a factor of proportionality that is the unity.In practice, a training method cannot be perfect, therefore this linear dependence persists with a lower factor of proportionality, For more details and the informational meaning of this measure we refer the interested reader to the original work of [16].
For quantum models where analytical calculations are not available, we proceed with the numerical evaluation of LM dimension, which we denote as DLM .Naturally DLM lower bounds D LM and this can be understood from the procedure that we follow for each encodingrotation model under test: • We set the k-dimension of the inputs of the model.
According to our general model for a qudit, we have k ≤ d 2 − 1.We generate a set of {x n } points in random position, which we call random pattern, by selecting each of k coordinates from a uniform distribution in the interval [−0.5, 0.5].We start with a n ≤ P where P = S + W the total number of parameters.
• According to the definition of LM dimension, we treat only binary classification tasks, and we attribute labels randomly to the vectors of the random pattern to two groups.For a given random pattern, one should test all 2 n different labelings, but not having the computational capacity for this, when n > 6 we perform our estimate by taking a sample of 50 different random labelings.
• If the training of parameters via gradient descent, with 50 different starting points, does not lead to classification of the vectors of the random pattern into two groups with 100% success ratio, we repeat for other random patterns {x n } until we find a pattern that is successfully classified for all possible labelings.However, we do not exceed the number of 10 different random patterns under test.• The number n is step-wised increased up to the point where the classification is no longer successful for any tested random pattern.The empirical LM dimension, DLM is the highest n where the classification is achieved for at least one random pattern (all possible labelings).
In the Table I we present the results on the empirical estimation of LM dimension for a qubit.The generators of the algebra, ĝi for a qubit system, are identified as the Pauli operators: ĝ1 = σx , ĝ2 = σy , ĝ3 = σz .For all qubit schemes under study the classification is performed by measuring the operator ĝ3 with eigenvalues {−1, 1} and corresponding eigenvectors {|1 , |0 }.The two groups of data are separated according ĝ3 ≷ 0. For comparison with our single-layer model, we have also included models (E − F Table I) which implement reuploading of input data [6,17].
We proceed with the estimation of DLM for a qutrit with results presented in Table II.The generators ĝi of the SU (3) group can be chosen to be the Gell-Mann operators, λi with i = 1, . . ., 8, which are provided in matrix form in the Appendix.According to Section II, during the encoding phase the classical data are mapped onto the Bloch hyper-sphere of a qutrit embedded in the 8dimensional space which cannot be visualized.To obtain a partial visualization, as for example in Section V A 1, we use the Bloch-Ball representation offered by the su(2) subalgebra of su(3), spanned by the generators Lx = λ1 + λ6 , Ly = λ2 + λ7 , Lz = λ3 + √ 3 λ8 .For all schemes, we choose to measure the operator Lz = λ3 + √ 3 λ8 that is diagonal in the computational basis and with uniform spectrum {−2, 0, 2}.For binaryclassification results, as shown in Table II, we separate the two groups according to the sign of Lz .With regard to efficiency, the single-layer schemes that achieve DLM = 2P can be considered as the most successful ones, i.e. qubit: A, G, qutrit: D2.From the qubit models C,D and qutrit model D1 we may conclude that both the absence and the excessive input of parameters in the encoding phase is not recommended.We also observe that the most successful single-layer models are the ones where k ≈ d 2 − 1.
For classical neural networks, for a fixed input dimension k, one can linearly augment LM dimension with the number of parameters by adding hidden layers [16].For the model presented here, this becomes possible by using a qudit system where k < d 2 − 1.The scaling DLM = 2P is not maintained but one rather achieves DLM ≈ P as it is shown with k = 2 with qutrit model B. We have also implemented classification with k = 2 for a 4-level system (not shown), where k = 2, P = 13 and DLM = 10.An alternative way to increase LM dimension is to use reuploading, see qubit models E − F , but there the scaling DLM = 2P also is not achieved but rather DLM = P +L, with L being a constant.(We have also implemented 4layers re-uploading, extending models E − F with k = 2, P = 8 and where we estimated DLM = 11 -not shown in the tables.)This analysis confirms the findings in [18] and underlines the need for more research in identifying quantum models which exceed the classical limits.
Finally, for single-layer models and k ≈ d 2 − 1, we see that DLM is higher than the one for the classical neural network.It would be interesting to see whether more exotic classical perceptron models such as productunits [21,22] or complex-valued perceptron [23] exhibit similar augmentation of LM dimension.In addition we underline the fact that the LM dimension only captures a specific aspect of the model.For a complete evaluation of the quantum model for supervised learning task, other aspects [24] would have to be taken into account, e.g., difficulty in training (barren-plateaus problem), presence of noise in implementation etc.The conclusions of the numerical studies on LM dimension are illustrated with examples in next sections.Since LM dimension only concerns capacity of binary classification tasks, we address classifciation problems with M > 2 classes as well.

IV. CLASSIFICATION PROBLEMS TREATED WITH A QUBIT
We start the illustration of the suggested method by addressing two typical classification problems with a qubit (d = 2).Even though the power of a qubit has been extensively studied in the literature, this is the first example showing that a qubit can be logically complete, i.e., it is able to implement all binary logical functions.This is achieved with the model A, Table I, which contains two real parameters.This outcome does not come as surprise since model A has LM dimension DLM = 4 for k = 2, or, in other words, can shatter all possible ways four 2-dimensional vectors in random positions.

A. Binary logical functions
Let us consider four data-sets on a plane (k = 2), as shown in Fig. 1 (a).The logical functions for these noisy data correspond to different attributions of each data-set to one of two groups, A and B. For instance, the XOR function requires a classification of the data-sets as in Fig. 1 (a).
To implement classification according to the logical functions, we first map the data onto the 2-dimensional surface of the Bloch sphere.Even if the feature space has the same dimension as the initial space, the change in topology proves to be helpful.Numerical tests show that all logical functions can be implemented this way with 2 real weights (S = 1 and W = 1).In more details, we use the encoding and rotation as in model-A for a qubit, see Table I, and the classification is conducted using the sign of ĝ3 .
We successfully solved classification problems for all logical functions (AND, OR, XOR), however we present in Fig. 1 (b) only the results about XOR, which is the most challenging task, since it is a non-linearly separable problem.The total number of data is 2000 and we use 4% of them for the training.A success ratio of classification of 100% was readily achieved.
It is important to note here that all binary logical functions can be solved with 2 real parameters also by the complex perceptron model presented in [23].We proceed with an example that it is not solvable with any singlelayer classical perceptron model up to our knowledge.

B. Classification for circular boundaries
We proceed with a more complex classification problem and show that it can still be tackled with a single qubit.For this purpose, we employ model B, Table I, because it achieves a higher LM dimension than model A.
The problem consists of classifying the data (1000 2dimensional vectors) in Fig. 2 (a) into two groups.In Fig. 2 (b), we present the classification achieved on the Bloch sphere after the weights s 1 , w 1 , w 2 , w 3 have been optimized.The classification ratio achieved is 100% using 10% of total data set as training data set.
Following the same encoding-rotation scenario (model B), we are able to treat elliptical data (not presented here), but with a lower final classification ratio (≈ 90%).

V. EXAMPLES SOLVED WITH A QUTRIT
Even though we have been able to solve a couple of basic classification problems with one qubit, it is obvious that one needs a higher dimensional space d to resolve more complicated problems since one qubit can accommodate at most 5 parameters/weights according to our single-layer model.As shown in the Section III qutrits may accomodate more parameters and therefore achieve higher LM dimension.In addition, tests have shown that qutrit models perform better than qubit models for classification tasks into M > 2 groups.This is not obvious studying LM dimension alone.

A. Noisy XOR
We first investigate the binary classification task presented in Fig. 3 (a) for which all qubit-models exhibited low performance but where qutrit's model B, see Table II, gives adequate results.More specifically, we use 1% of the total data (2000 points) for training and achieve a success classification ratio of 96%.

Classification into three groups and a geometric picture
We increase the difficulty of the previous problem by demanding classification in 3 groups of data and reducing the margins between sets, as shown in Fig. 4 (b).We   use a comparable number of weights ( 9), but now the encoding-rotation model is: • Rotation via where L1 = Lx , L2 = Ly , L3 = Lz .

• Measurement of Lz and classification by comparing the value of Lz with
. Using 4% of the total data (2000 points) for training, a success ratio of classification 87% is achieved for the rest of data.In Fig. 4, we depict the mapping (with optimized parameters) of the data into the SU (2) Bloch ball generated by Lx , Ly Lz operators.The classification 'intervals' for Lz are also presented in the picture as horizontal lines.This 'local' picture offered by the subgroup is equivalent to the picture one would obtain by inspecting the local density matrix of an entangled system.One can thus claim by borrowing terms by the notion of generalized entanglement [25] that the self-entanglement of a qutrit has the same use in the classification procedure as physical entanglement between subsystems, i.e., this extends the mapping from the surface to the inside area of the Bloch hypersphere of a subsystem.The generation of self-entanglement in a qudit does require the ability to fully operate the system but in practice this is less demanding than the entangling interaction between subsystems.

B. Classifying moon sets with a qutrit
Finally, by using qutrit model C of Table II, we attempt a common classification task, the one of moon sets.By optimizing the 8 parameters of the model, we achieve a classification ratio of 90% using 10% of 800 total data points.In Fig. 5, we present Lz for the optimized set of parameters, together with the data sets.

C. Real-world data multi-class classification
We will now turn to multi-class classification tasks using real-world data and more advanced methods of training.We use data sets from the UCI Machine Learning Repository, a widely-used and publicly available repository [26], maintained by the University of California, Irvine.Our aim is to explore the feasibility of using a single qutrit to accurately distinguish between three classes in data sets with more than two dimensions, such as the Iris and Wine datasets.Our results illustrate that supervised learning in the context of less structured data is achievable.
The Iris dataset consists of 150 samples of iris flowers, with measurements of four features: sepal length, sepal width, petal length, and petal width.Each sample is labeled with one of three possible iris species.The Wine Cultivars dataset consists of measurements of thirteen chemical constituents found in three different wine cultivars.The objective is to classify the cultivar of the wine based on the chemical composition measurements.Our aim is to use the same encoding and number of parameters for both data sets.Thus for the Wine Cultivars data set which possesses 13 different features we employ Principal Component Analysis (PCA) [27] in order to reduce the number of features to four.The encoding and rotating scheme that we follow is • Rotation via where the variational weights w = (s, w 1 , w 2 , w 3 , w 4 ) are the parameters to be optimized.
We ensured an equal representation of each class.For re-producibility, we used the same seed to split the data into train and test sets.To avoid overfitting, early stopping was employed and different gradient-based methods were trialed to combat the barren plateaus problem before settling to stochastic gradient descent (SGD) [28] using the parameter-shift rule.This method reduces the number of measurements needed during implementation compared to the standard method, making it more efficient and practical for quantum machine learning.SGD is a variant of gradient descent that randomly selects a subset of data points, called mini-batch, to calculate the gradient of the cost function at each iteration.Since these are multi-class problems categorical cross entropy loss was used as a cost function.Using this approach, we achieved competitive scores with a single qutrit as can be seen in the Table IV.
Since these are multi-class problems categorical cross entropy was used as cost function, which combines the softmax activation and the negative log likelihood loss as follows: Here N is the number of samples, C is the number of classes, t ij represents the true label for sample i and class j, and p ij represents the predicted probability.Using this approach, we achieved competitive scores with a single qutrit as can be seen in the Table IV.In these benchmarks, we present the results of the single qutrit model against a classical machine learning model using Support Vector Machines (SVM) and a Variational Quantum Classifier (VQC) model with entangled qubits in Qiskit.These tests were conducted using four qubits and the popular ZZ feature map with twelve parameters, utilizing the Limited-memory Broyden-Fletcher-Goldfarb-Shanno Bound (L-BFGS-B) optimizer to minimize sensitivity to local minima and the barren plateau issue [29].
These results showcase that even a single-qudit classifier is capable of multi-class classification for multidimensional real-world data.Although on the Iris data set the four-qubits model outperformed the single qutrit model, the single qutrit model produced better results even with five parameters compared to the twelve used by the ZZ feature map on the Wine data set.Increasing the encoding layers could further enhance the classifier's performance, but since the aim of our study was to demonstrate that a single-layer qudit classifier can accurately distinguish between multiple classes, it is not further investigated here.

VI. DISCUSSION
Qudits are extensions of qubit units to higherdimensions, which can enhance the performance in quantum computing [30][31][32] and communication [33][34][35].These are experimentally realizable with different physical models and recent proposals also use them in quantum machine learning [17,36].In this work, we have described a model for data classification using a single qudit.The parametrization is introduced according to geometric intuition, partially for controlling the mapping on the Bloch hyper-surface and partially for adjusting the projective measurement to the data set's orienta-tion on the Bloch hyper-sphere.Entangling or adding more layers can certainly enhance the quantum classifier, similar to how classical neural networks yield better results with increased depth.Nonetheless, given the expense and error-prone nature of entangling in near-term quantum hardware, our results indicate that even a lowdepth single-qudit classifier holds a promise for quantum machine learning, if it is thoughtfully employed with a balanced distribution of parameters in the encoding and rotating steps.
The simple model that we present shares obvious similarities and borrows ideas from previous works [5][6][7]17].Being though only in the mid-way of exploration of the potential role of quantum systems for ML tasks, this geometrically-dressed entanglement-free proposal gives its own contribution, connecting current efforts with the geometry of Hilbert-Schmidt space and underlying the equivalence of self-entanglement [25] with physical one in practice.In addition, with the help of empirical estimation of LM dimension for a qubit and a qutrit, we have been able to demonstrate that the 'capacity' of singlelayer quantum systems can be higher than for classical neural network systems bearing the same number of training parameters.It remains an open question for future work to investigate and compare the capacity of the quantum model with more intrigued single-layer classical perceptron models but also to investigate whether quantum multi-layer structures can exist which can keep the advantage in LM dimension over classical NN.

d− 1 k=0
such a set of normalized eigenstates.Then one can express a generic qudit state |ψ = d−1 k=0 c k |k (1) by d complex amplitudes c over this basis, being constrained by the normalization condition d−1 k=0 |c k | 2 = 1.

FIG. 1 :
FIG. 1: (a) Data to be classified according to XOR logical function, into groups A and B. (b) The classified data mapped on the surface of Bloch sphere (projection on the x-z plane) after training on the 2 weights has been performed. FIG.

2 :
(a) The initial data (1000 points) to be classified into groups A and B. (b) The data are mapped on the Bloch surface and perfectly classified.

FIG. 4 :
FIG. 4: The classification of data of Fig. 4 (b) into 3 groups as perceived in the SU (2) Bloch sphere representation provided by the operators Lx, Ly Lz .

1 FIG. 5 :
FIG.5: Classification of moon sets with the qutrit model C using 8 weights.A contour plot of Lz is depicted together with the moon data after optimization has been performed.

TABLE II :
Qutrit models

TABLE III :
Real-world data treated with a qutrit

TABLE IV :
Comparative numerical studies for classification of Iris and Wine data.The train set accuracy (TrSA) and test set accuracy (TeSA) reached with different methods.