1 Introduction

The present paper will deal with a general analysis of QSPR.Footnote 1 Circumscribing the discussion on the issue of using molecular spaces,Footnote 2 not descriptor or parameter spaces, is a usual framework in the main literature trend.

Since the dawn of the twenty-first century, several assorted papers [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40] have attracted our attention and have been published dealing with classical QSPR (CQSPR); in the same way, work around the associated mathematical problems appearing in this class of techniques are included in the same list.

Since the humble origin of quantum similarity [41], in the subsequent quantum QSPR (QQSPR) studies collection [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94], as far as the present author can tell, no particular recommendations about how to prepare the whole set of data, which is usually involved in the QSPR procedures, have been discussed so far. The situation can be considered an intriguing fact, which we’ll try to revise in the present study.

Previous papers have discussed some QSPR processing recommendations; reference [95]. But perhaps some essential clues on data rearrangement are still missing from the list of alleged actions, which one might suggest based on the QSPR available information.

If present, such data rearrangements must be constructed as a collection of simple procedures, leading to a reproducible computational structure. While simultaneously setting the possibility of systematic comparison with any QSPR problems appearing in the future. Reproducibility in CQSPR seems to be not a strongly contemplated issue. For example, examine the contributions in reference [7]. Nevertheless, it is undoubtedly an essential requirement in scientific endeavors.

Intermediate data preparation procedures also seem not well-defined in the QSAR-QSPR literature. Therefore, one will discuss this subject in the present work. This author opines that one needs to ensure QSPR results are reproducible and comparable and that data preparation in QSPR procedures could be a first step all-purpose way to obtain such a goal.

This point of view will need or lead to new data manipulation and algorithmic description, which one will also discuss. In this sense, this study will provide a general isometry algorithm helpful in calculating the molecular set momenta and provide the basis of the QSPR operator construction.

Finally, there is no claim that this study’s algebraic proposals described below shall be taken as a final nor a unique possibility to obtaining reliable and well-structured QSPR. On the contrary: they might be considered an open way to the perfection of generic QSPR algorithms in molecular spaces.

Thus, we are opening the possibility of describing an alternative path to the parameter space calculations in current QSPR procedures.

2 Preliminary considerations

The QSPR methodological proposal considered here can start by admitting the following points as a working background:

  1. (a)

    Suppose known a set of \(M\) molecular structures, which in principle, one might consider as arbitrarily ordered and thus numbered accordingly:

  2. (b)

    Suppose some known property values are associated with the molecular set \({\mathbb{M}}: {\mathbf{P}} = \left\{ {p_{I} \left| {I = 1,M} \right.} \right\}\). Then, one standardizes the available property values according to the reference [95] suggestions, transforming them into values within the unit interval.

Therefore, from now on, one can also take for granted that in any QSPR procedure, the property set lies on the compact unit interval: \({\mathbf{P}} \Rightarrow {{\varvec{\Pi}}} \subset \left[ {0,1} \right].\)

Such a transformation is essential for comparing the predicted property results obtained by applying QSPR algorithms. One suggested this standardization [95] to carry a given QSPR problem into the most accessible comparative form with any other QSPR problem in the future.

Such property values transformation will make any QSPR problem homogeneous concerning any other problem property.

  1. (c)

    Suppose the preparation of a QSPR procedure so that the molecular set’s descriptors are employed to construct a molecular space noted as: \({\mathbb{V}}_{D \ge M}\), and bearing a dimension D imposed by the number of different descriptors of the molecular structuresFootnote 3 [85].

This point is systematically ignored in CQSPR, as the usual classical procedures describe such problems as belonging to the descriptor or parameter space.

In principle, one must arrange molecular descriptors so that every molecule of the set \(\mathbb{M}\) becomes associated with a vector of sufficient arbitrary dimensions. Defining in this manner, a column vector set: \({\mathbb{D}} = \left\{ {\left| {{\mathbf{d}}_{I} } \right\rangle \left| {I = 1,M} \right.} \right\},\) is connected in a one-to-one correspondence with the molecular set \({\mathbb{M}}\), that is:

$$\forall I = 1,M:m_{I} \in {\mathbb{M}} \wedge \left| {{\mathbf{d}}_{I} } \right\rangle \in {\mathbb{D}} \Rightarrow m_{I} \leftrightarrow \left| {{\mathbf{d}}_{I} } \right\rangle \to {\mathbb{M}} \Leftrightarrow {\mathbb{D}}$$
(1)

Consequently, this is equivalent to assuming that in a general QSPR formalism, one describes every molecule by a finite-dimensional vector in CQSPR or by an infinite-dimensional density function in QQSPR.

Such description prepares the molecular structures of point (a) to be considered elements of a molecular spaceFootnote 4 of the appropriate dimension.

Despite that, in the usual literature, such a molecular vector space definition appears systematically ignored; however, one cannot avoid the presence of the molecular structures described as vectors in QSPR calculations.

Even if AI manipulations lead to black box QSPR-like results, one cannot avoid this vector-molecule information source; for example, see references [6, 25, 26, 31, 32, 34, 36] for more evidence.

3 Systematic ordering of any molecular set

Admitting to start the discussion of the QQSPR problems with the three points of the previous section as a sound working foundation, one can systematically achieve a general QSPR ordering procedure.

First, one can compute the self-similarities of the molecular set, employing their description elements. Then one can order them according to their values. Therefore, this simple process generates a new order in the original molecular set.

Hypothetically, the ordering of the whole QSPR data set, as accepted by definition in point (a) of the previous paragraph, might be considered arbitrarily chosen as is usual in the QSPR literature.

Nevertheless, in parallel to the previous reordering, after unit interval standardization of the known property values as indicated in point (b) of the previous section, it is not too difficult to assume that this will allow us to easily compare the data of a particular QSPR calculation with any other one.

One could perform this kind of systematic standardization even if the compared problems correspond to different molecular sets or possess diverse inhomogeneous elements.

Therefore, as it seems from a logical point of view, it could be very interesting that one might systematically prepare QSPR procedures. And one performs this groundwork in the same way before obtaining the QSPR operator or function, relating the structure of the molecular set algebraic representation with their properties, regardless of the nature of the studied problem property.

One can thus consider every QSPR calculation comparable with any other. Not only might one transform the molecular property values set \({\mathbb{P}}\) into a normalized one, as indicated in reference [95], but one can also propose that one might standardize the molecular ordering, using selfsimilarities for this task.

A systematic reordering of the involved QSPR molecular set \({\mathbb{M}}\) would permit us to easily compare a particular calculation with any other submitted to an equivalent treatment.

3.1 QSPR ordering algorithm

Thus, one can propose obtaining a simple and effective universal ordering of the descriptor data set \(\mathbb{D}\) of any QSPR problem. To arrive to achieve this purpose, one might apply the following procedure.

One can start a QSPR ordering algorithm by using the elements of the molecular descriptor set \(\mathbb{D}\). One can easily obtain the norms of the vector representation of every molecule and further define their values as an additional set, like:

$${\mathbb{S}} = \left\{ {s_{I} = \left\langle {{\mathbf{d}}_{I} } \right|\Gamma_{M} \left| {{\mathbf{d}}_{I} } \right\rangle \left| {I = 1,M} \right.} \right\}.$$
(2)

where one defines the metric signature matrix \(\Gamma_{M}\) as a diagonal matrix:

$$\Gamma_{M} = Diag\left( {\gamma_{I} \left| {I = 1,M} \right.} \right),$$
(3)

such that all the non-null diagonal elements contain only positive or negative units. That is, one can consider the definition of the following dimensions and diagonal matrices:

$$M = P + N:\Gamma_{M} = \Gamma_{P} \oplus \Gamma_{N} \leftarrow \Gamma_{P} = {\mathbf{I}}_{P} \wedge \Gamma_{N} = - {\mathbf{I}}_{N} .$$
(4)

Thus, one can also consider the diagonal matrix \(\Gamma_{M}\) as isomorphic to an M-dimensional column vector \(\left| {\Gamma_{M} } \right\rangle\), with its elements containing the diagonal elements of the matrix \(\Gamma_{M}\), or as a matrix signature in the sense of reference [96].

On the other hand, one can also consider the norm set \(\mathbb{S}\) in the Eq. (2) as connected with a one-to-one correspondence to the molecular set: \({\mathbb{M}} \Leftrightarrow {\mathbb{S}}.\)

Admitting norms can be calculated for every molecular descriptor vector; this corresponds to associating such vectors, as defined within a Banach space, to the QSPR studied problem.

Proceeding in this way, not only the QSPR working space is a Banach space, but also a Minkowski space, which one can transform into a Euclidian oneFootnote 5 if needed. This possibility will be accomplished when the negative part role of the metric signature matrix \(\Gamma_{N}\) is absent, and thus the following equality holds:

$$M = P:\Gamma_{M} = \Gamma_{P} = {\mathbf{I}}_{M} ,$$
(5)

where

$${\mathbf{I}}_{M} = \left\{ {I_{IJ} = \delta \left( {I = J} \right)\left| {I,J = 1,M} \right.} \right\}$$
(6)

is the \(\left( {M \times M} \right)\) unit matrix.Footnote 6

Now, adopting the vocabulary of quantum similarity, the \(\mathbb{S}\) set of the norms of the molecular vector representations contains the selfsimilarities attached to each element of the molecular set.

Suppose each molecular descriptor corresponds to a function, like in QQSPR, where electronic density functions are employed as molecular descriptor vectors; see, for example, references [42,43,44, 86]. In that case, the Euclidian norm of a finite-dimensional CQSPR vector is substituted by the integral of such squared QQSPR function.

According to the above definition, the norm set \(\mathbb{S}\) is made by a (sometimes positive) definite set of real (in computational practice, rational) numbers.

Hence, it is evident that one can easily reorder the set \(\mathbb{S}\) from, say, the smaller to the more significant (or vice versa) norm. Therefore, one can consider that the reordered subindices now mean that:

$$s_{1} < s_{2} < ... \, s_{M - 1} < s_{M} \Rightarrow m_{1} \prec m_{2} \prec ... \, m_{M - 1} \prec m_{M} ,$$
(7)

the symbol\(... \, m_{I} \prec m_{I + 1} \, ...\) means that the molecule \(m_{I}\) on the left precedes the molecule \(m_{I + 1}\) on the right.

From now on, we can suppose, before performing any proper QSPR calculation, that such supplementary order conditioning of the molecular set \(\mathbb{M}\) is complete.

Then one can consider that one now reorders the set \(\mathbb{M}\) in such a way that:

$$m_{1} \to s_{1} = \mathop {\min }\limits_{I} \left[ {s_{I} } \right],$$
(8)

while:

$$m_{M} \to s_{M} = \mathop {\max }\limits_{I} \left[ {s_{I} } \right]$$
(9)

or the other way around if necessary.

Proceeding in this manner, after knowing the possibility of ordering the QSPR data and elements of the set \(\mathbb{M}\), one can add a fourth term to the three initial conditions proposed in the previous section:

  1. (d)

    One might always perform QSPR calculations systematically on a molecular set \(\mathbb{M}\), ordered according to the increasing (alternatively: decreasing) values of the molecular selfsimilarities.

Moreover, according to the previous point (b), this will also be made in the company of a set of standardized property values, which shall be accordingly reordered.

4 Selfsimilarities and the similarity matrix (re-) and (de-) construction

Self-similarities thus play an essential role in QSPR procedures as a simple source of systematic data reordering. They have been studied deeply in the QQSPR framework [54, 97], but have been apparently ignored in CQSPR. This circumstance is due perhaps to molecular space oblivion in classical procedures.

Within this section, one will now develop a general procedure valid for both QSPR practical computational branches.

4.1 Definition of the similarity matrix

The selfsimilarity set \(\mathbb{S}\) corresponds to the diagonal elements of a symmetric matrix, the so-called similarity matrix; see for more information, for example, reference [60], which is an \(\left( {M \times M} \right)\) array that one can supposedly construct holding all the possible scalar products between the molecular descriptor vector pairs:

$${\mathbf{Z}} = \left\{ {z_{IJ} = \left\langle {{\mathbf{d}}_{I} } \right|\Gamma_{M} \left| {{\mathbf{d}}_{J} } \right\rangle \left| {I,J = 1,M} \right.} \right\} = {\mathbf{Z}}^{T} ,$$
(10)

where one has explicitly used the diagonal metric signature matrix \(\Gamma_{M}\) to obtain, in this manner, a general picture of the similarity matrix computation.

Owing to this definition, one can also write the selfsimilarities set \(\mathbb{S}\) as the diagonal elements of the similarity matrix:

$$\forall I = 1,M:s_{I} = z_{II} \Leftrightarrow {\mathbb{S}} = Diag\left( {\mathbf{Z}} \right).$$
(11)

The possibility of constructing the similarity matrix, as in the Eq. (10), allows the molecular space to be associated with a Minkowskian pre-Hilbert space [83].

4.2 Similarity matrix properties

Moreover, the similarity matrix is coincident with the metric matrix of the vector (sub)space generated by the descriptor set \(\mathbb{D}\). This is so because in case every molecule in the set \(\mathbb{M}\) is different from the rest (which strictly corresponds to a more than a reasonable general situation in any QSPR data setup), then every molecular descriptor vector must be linearly independent of the other descriptor vectors. See for a discussion of this relevant point, for example, reference [84].

If the elements of set \(\mathbb{D}\) do not bear such an algebraic characteristic, then, in CQSPR procedures, one faces the so-called dimensionality paradox [85].

One might state such a condition attached to the descriptor set \(\mathbb{D}\) as a point added to the four previously discussed ones:

  1. (e)

    The molecular descriptor set \(\mathbb{D}\) in each QSPR problem must always be constructed as a set of linearly independent vectors in the molecular space.

As commented before, to avoid the dimensionality paradox, the dimension D of the vector space containing them must be: \(D \ge M\).

This necessary condition has no connection with the possibility of handling the CQSAR problem as a subject of statistical studies in parameter space. Constitutes instead an unavoidable preliminary condition that shall be attached to any molecular set mathematical description.

4.3 Successive approximations to the similarity matrix

Once the set of descriptor vectors \(\mathbb{D}\) is constructed and computed the similarity matrix via the Eq. (10), one might put forward some nuances related to the successive approximations of the matrix Z.

One can accept the set of the diagonal elements of the similarity matrix as an approximation of order zero. For this reason, we can also represent such a zeroth-order similarity matrix with the symbol \({\mathbf{Z}}_{0}\).

Starting with the diagonal zeroth-order similarity matrix, we can also consider as the next step the non-zero elements of the similarity matrix first sub-diagonals:

$$I = 1,M - 1:z_{I,I + 1} = z_{I + 1,I} .$$
(12)

One constructs in this manner a tridiagonal matrix. Then, we can name the resultant matrix the first-order similarity matrix and represent it as: \({\mathbf{Z}}_{1}\).

Consequently, proceeding in this way and adding to the first-order matrix the next non-zero subdiagonals, larger band matrices can be stepwise constructed. Then one can obtain a sequence of M similarity matrices, which can be written as follows:

$$\left\{ {{\mathbf{Z}}_{0} ;{\mathbf{Z}}_{1} ;{\mathbf{Z}}_{2} ;...{\mathbf{Z}}_{M - 2} ;{\mathbf{Z}}_{M - 1} \equiv {\mathbf{Z}}} \right\}.$$
(13)

To adequately describe a simple algorithm leading to the above sequence (13) of approximate similarity matrices, we can first define a sequence of M matrices initially holding the diagonal and the successive sub-diagonals as the unique non-zero elements:

$$\left\{ {{\mathbf{D}}_{P} \left| {P = 0,M - 1} \right.} \right\} \to {\mathbf{D}}_{P} = \left\{ {z_{I,I + P} \wedge z_{I + P,I} \left| {I = 1,M - P} \right.} \right\}.$$
(14)

In this way, one can easily write the sequence of approximate similarity matrices (13), concerning the P-th order approximation as follows:

$$\forall P = 0,M - 1:{\mathbf{Z}}_{P} = \sum\limits_{K = 0}^{P} {{\mathbf{D}}_{K} } .$$
(15)

Therefore, such simple (re-)construction permits to build of a sequential set of similarity matrix approximations, which starts from the self-similarity diagonal and ends up with the full similarity matrix. A final approximation stage leading to the exact similarity matrix is reached when \(P = M - 1\). In this last step, the matrix \({\mathbf{D}}_{M - 1}\) possesses two non-zero elements only: \(z_{1,M - 1} = z_{M - 1,1} .\)

One can obtain a convenient (de-)construction of the similarity matrix by subtracting from the original similarity matrix the sequence of sub-diagonal matrices defined in the Eq. (14).

4.4 Eigenvalues and eigenvectors of the similarity matrix

One has postulated the eigenvalues and eigenvectors of the similarity matrix as building blocks to algebraically construct statistical-like moments of the molecular set \(\mathbb{M}\) [88].

One can use the momenta of the molecular set to construct a QSPR operator, too, see reference [78], able to compute a chosen molecular property. Furthermore, one might use condensed scalar momenta to represent geometrically and numerically the molecular set \(\mathbb{M}\) [88,89,90].

Still, by transforming the eigenvectors of the quantum similarity matrix into an isometric vector set, the resultant M-dimensional vector set can be admitted as the finite-dimensional representation of the infinite-dimensional quantum representation of the molecular set \(\mathbb{M}\).

A most interesting computational fact to be highlighted now corresponds to the zeroth-order approximation.

Indeed, the matrix \({\mathbf{Z}}_{0}\) is diagonal; thus, their eigenvalues are the diagonal elements themselves and the eigenvectors corresponding to \({\mathbf{I}}_{M}\), the unit matrix of the adequate dimension, which is, in fact, a \(\left( {M \times M} \right)\) matrix, like any eigenvector matrix related to higher-order approximations.

Therefore, calculating the moments of this zeroth-order similarity matrix becomes equivalent to computing the statistical moments of a scalar set made by the self-similarities \(\mathbb{S}\).

Besides the tridiagonal first-order matrix, the higher-order elements in the sequences (13) and (15) correspond to so-called symmetric band matrices.

4.5 Spectral indefiniteness of the similarity matrices and isometric vector representations

One can provide a remark now concerning the eigenvalues of the approximate similarity matrix sequence (13).

The zeroth order and the final exact similarity matrix are by construction positive definite whenever one chooses a Euclidian metric signature matrix \(\Gamma_{M} = {\mathbf{I}}_{M}\).

In some molecular QQSPR cases, the superposition of the involved molecules performed to obtain optimal similarity integrals [97, 98] might result in a non-definite exact similarity matrix [91, 92].

Such circumstantial spectral non-definiteness might also appear in part or all of the intermediate band similarity matrices. Thus, it may start in the tridiagonal form of the first-order approximation.

In all the cases of non-definite eigensystem sets, one can compute the isometric vectors [89] necessary to obtain the statistical-like moments of the molecular descriptor set (for a résumé of the computational details, see reference [91]), via a so-called synisometry procedure [92], where one uses the absolute values of the negative eigenvalues to avoid complex algebra.

However, a better way to proceed consists in using the scalar products definition, leading to computing the elements of the similarity matrices into a Minkowskian Banach space, as explained before in Sect. 3.1. For this purpose, a diagonal metric signature matrix holding the appropriate signs is assigned to the space directions; see [99] for extended mathematical details.

Of course, one retrieves the usual Euclidian space when the diagonal metric matrix coincides with the appropriate dimension unit matrix.

The possibility of working within a Minkowskian metric Banach space has been ignored in CQSPR, while this possible framework has been recently put forward in QQSPR [99].

Working in Minkowskian spaces in both QSPR contexts could enhance the possibilities of finding better ways to obtain ameliorated functions, relating molecular structure with their properties, with a simple mathematical variation.

4.6 Numerical problems concerning the similarity matrix when considered as a metric matrix

The computational problems about numerical instability of matrix diagonalization are well-known in numerical linear algebra and have been studied since old times; see, for example, reference [100].

Also, as a typical case study, the Hilbert matrix [101] corresponds to a positive definite metric matrix made of scalar products, defined with the basis set elements of a polynomial vector space. Depending on the machine’s precision and the chosen dimension, the Hilbert matrix becomes non-definite, even singular, for practical computational purposes.

Even if the similarity matrix elements are well-defined for a given QSPR problem, some weird numerical behavior is produced when diagonalization or inversion is involved.

The dimension of the manipulated matrix and the finite machine precision could produce that a positive definite matrix numerically behaves as a non-definite one.

A similar problem also occurs in systematic quantum chemical calculations involving large basis sets. In these cases, the overlap matrix is nothing else than a metric matrix attached to the atomic basis set and has well-defined elements. Consequently, it has to be positive definite but computationally becomes non-definite or even singular in some cases due to numerical instabilities of the same sort as previously discussed here.

The quantum similarity representation of the periodic table of the elements, with the use of Gaussian atomic densities attached to each atom, was presented in previous work[102].Footnote 7 This study generates the same numerical problem when computing the similarity matrix, even with a simple function used as an atomic quantum descriptor.

Thus, one must be aware that in molecular spaces, when handling large numbers of molecules in a QSPR problem, the complete similarity matrix could become numerically singular. In those cases, deconstructing the similarity matrix could be good advice.

5 Isometric QQSPR molecular spaces

CQSPR and QQSPR Molecular Spaces can be associated with a similarity matrix constructed using the implicit algorithm in the Eq. (10).

However, the similarity matrix in the QQSPR environment can only be constructed via a set of positive integrals, holding a volume or measure, which are not so easy to compute, using pairs of molecular electronic density functions.

It could be interesting without leaving the QQSPR framework to obtain an isometric finite-dimensional representation of the involved molecular set, emulating the CQSPR background structure, but providing a general automated way to construct the molecular description vectors.

In obtaining this isometric description, both CQSPR and QQSPR ways will appear on the same footing to describe the involved molecular set algebraically.

Some attempts to reach this goal have been described recently [91, 92], but a new study [99] contains the most general and exact algorithm, as far as we know.

For effectiveness, this procedure, developed from the vantage point of the inward vector products, will be sketched from a matrix point of view.

5.1 Towards the construction of an isometric vector set

The quantum similarity matrix Z, as defined in the Eq. (10), is a symmetric one, that is: \({\mathbf{Z}} = {\mathbf{Z}}^{T}\). As such, always exists to it an attached secular equation, which one can write in matrix form:

$${\mathbf{ZU}} = {\mathbf{U}}\Theta ,$$
(16)

where \({\mathbf{U}}\) corresponds to an orthonormal matrix:

$${\mathbf{U}}^{T} = {\mathbf{U}}^{ - 1} \to {\mathbf{U}}^{T} {\mathbf{U}} = {\mathbf{UU}}^{T} = {\mathbf{I}},$$
(17)

containing the eigenvectors of Z as columns, and the matrix \(\Theta\) is diagonal:

$$\Theta = Diag\left( {\theta_{I} \left| {I = 1,M} \right.} \right),$$
(18)

it contains the eigenvalues of Z, ordered the same way as the column vectors in U.

As commented, when issued from a computed quantum similarity matrix, the diagonal matrix \(\Theta\) could be non-definite. But one can rewrite it with a metric signature matrix \(\Gamma_{M}\), as described early in the Eqs. (3) and (4).

First, defining the auxiliary definite positive diagonal matrix:

$$\Lambda = \left| \Theta \right| = Diag\left( {\lambda_{I} = \left| {\theta_{I} } \right|\left| {I = 1,M} \right.} \right),$$
(19)

then one can retrieve the original eigenvalues matrix and write:

$$\Theta = \Lambda \Gamma_{M} = \Gamma_{M} \Lambda .$$
(20)

In this case, one can construct the metric signature matrix with the aid of a pair of logical Kronecker’s deltas:

$$\Gamma_{M} = \left\{ {\Gamma_{M;II} = \delta \left( {\theta_{I} > 0} \right) - \delta \left( {\theta_{I} < 0} \right)} \right\}.$$
(21)

With this eigenvalue rearrangement, one can also define the square root of the unsigned eigenvalues matrix:

$$\Lambda^{{\tfrac{1}{2}}} = Diag\left( {\lambda_{I}^{{\tfrac{1}{2}}} = \sqrt {\left| {\theta_{I} } \right|} \left| {I = 1,M} \right.} \right) \to \Lambda = \Lambda^{{\tfrac{1}{2}}} \Lambda^{{\tfrac{1}{2}}} ,$$
(22)

and using the commutativity of the product of diagonal matrices, one can write:

$$\Theta = \Lambda \Gamma_{M} = \Lambda^{{\tfrac{1}{2}}} \Lambda^{{\tfrac{1}{2}}} \Gamma_{M} = \Lambda^{{\tfrac{1}{2}}} \Gamma_{M} \Lambda^{{\tfrac{1}{2}}} ;$$
(23)

thus, rearranging the Eq. (16), plus taking into account the Eqs. (17), (19), (20), (22), and (23), one can write:

$${\mathbf{Z}} = {\mathbf{U}}\Theta {\mathbf{U}}^{T} = {\mathbf{U}}\Lambda \Gamma_{M} {\mathbf{U}}^{T} = {\mathbf{U}}\Lambda^{{\tfrac{1}{2}}} \Gamma_{M} \Lambda^{{\tfrac{1}{2}}} {\mathbf{U}}^{T} = {\mathbf{D}}^{T} \Gamma_{M} {\mathbf{D}},$$
(24)

where a new matrix, holding the isometric vectors as columns, is defined as:

$${\mathbf{D}} = \Lambda^{{\tfrac{1}{2}}} {\mathbf{U}}^{T} \to \forall I = 1,M:\left| {{\mathbf{d}}_{I} } \right\rangle = \Lambda^{{\tfrac{1}{2}}} \left| {{\mathbf{u}}_{I}^{T} } \right\rangle .$$
(25)

The column vector set \({\mathbb{D}} = \left\{ {\left| {{\mathbf{d}}_{I} } \right\rangle \left| {I = 1,M} \right.} \right\} \subset {\mathbb{V}}_{M}\), which has to be used to compute the scalar products with the appropriate metric signature matrix via the Eq. (24), can be seen in the same manner as the elements described in the Eq. (10). They constitute a set of vectors isometric to the quantum density functions representing the molecules in the studied set.

The earlier synisometric vectors, see reference [99] for more details, are constructed by transforming the metric signature matrix into a Euclidian metric matrix. That is: simply performing the substitution: \(\Gamma_{M} \to {\mathbf{I}}_{M}\), in the Eq. (24), thus, constituting an obvious simplification and ignoring the presence of a possible Minkowskian metric, which can be crucial in some cases to obtain a suitable molecular structure–property relation.

5.2 The discrete representation of a molecular set nature

As a consequence of the discussion in the preceding Sect. 5.1, in both QSPR environments, one can arrive at the description of a molecular set constructed as a set of vectors belonging to a vector space of similar characteristics and dimensions. However, even if the set of discrete vectors \(\mathbb{D}\) seems equivalent in both classical and quantum procedures, their nature is quite different.

The construction of the classical \(\mathbb{D}\) vector set is, in many aspects, arbitrary, as it contains intrinsic difficulties in distinguishing and including as descriptors some molecular attributes, like conformation structures, electronic states, and optical isomers…, to mention some obvious ones. Even if the algorithms operating in CQSPR and AI claim a quantum origin of the parameters or descriptors employed in the molecular description, see the already cited papers [6, 25, 26, 31, 32, 34, 36].

Such drawbacks are not necessarily present in the quantum way of determining the elements of \(\mathbb{D}\). The precise, general character connected to quantum procedures must always be present in the systematic construction and use of the vector set \(\mathbb{D}\).

Even being aware of this classical drawback in front of the quantum algorithm, one can additionally enunciate a new point in the data manipulation of a QSPR molecular set representation:

  1. (f)

    One can transform the vector representation \(\mathbb{D}\), attached to any molecular set \(\mathbb{M}\), into an isometric vector set that generates the original similarity matrix and can be further used to describe \(\mathbb{M}\).

6 The vector set \(\mathbb{D}\) geometric connection: construction of a molecular polyhedron and the statistical-like manipulation of its elements

Here will be summarized the theoretical background of constructing a set of statistical-like parameters, which can be employed to describe in a condensed manner any molecular set \(\mathbb{M}\) via the vectors of the descriptor set \(\mathbb{D}\).

Such statistical-like parameters can also be associated with a set of collective distances [90].

Theoretical, computational, and practical details of this issue have been published in many instances [88, 89, 93, 94, 98, 99, 102], so only one will describe here the backbone of this general possibility.

6.1 Molecular polyhedra: centroid and origin shift

The first element of this descriptor transformation considers the set \(\mathbb{D}\) as a many-dimensional polyhedron or polytope (here, one will select the first name in front of the second). That is: a mathematical object containing M vertices associated with every vector in the set \(\mathbb{D}\). One can also name this geometric image of the set \(\mathbb{D}\) as a molecular polyhedron.

Admitting this, the centroid of this geometrical structure is easily calculated by:

$${\mathbb{D}} \left\{ {\left| {{\mathbf{d}}_{I} } \right\rangle \left| {I = 1,M} \right.} \right\} \subset {\mathbb{V}}_{M} \left( {\mathbb{R}} \right) \to \exists \left| {\mathbf{c}} \right\rangle = M^{ - 1} \sum\limits_{I = 1}^{M} {\left| {{\mathbf{d}}_{I} } \right\rangle \to } \left| {\mathbf{c}} \right\rangle \in {\mathbb{V}}_{M} \left( {\mathbb{R}} \right).$$
(26)

From there, one can follow the systematic transformation of the description of the set \(\mathbb{M}\) by defining a new vector descriptor set possessing a cero centroid, namely:

$$\begin{gathered} {\mathbb{G}} \left\{ {\left| {{\mathbf{g}}_{I} } \right\rangle = \left| {{\mathbf{d}}_{I} } \right\rangle - \left| {\mathbf{c}} \right\rangle \left| {I = 1,M} \right.} \right\} \to \hfill \\ M^{ - 1} \sum\limits_{I = 1}^{M} {\left| {{\mathbf{g}}_{I} } \right\rangle = } M^{ - 1} \sum\limits_{I = 1}^{M} {\left( {\left| {{\mathbf{d}}_{I} } \right\rangle - \left| {\mathbf{c}} \right\rangle } \right) = } M^{ - 1} \sum\limits_{I = 1}^{M} {\left| {{\mathbf{d}}_{I} } \right\rangle - \left| {\mathbf{c}} \right\rangle = } \left| {\mathbf{0}} \right\rangle , \hfill \\ \end{gathered}$$
(27)

this result generally corresponds to an algorithm transforming any molecular descriptor set into a new vector set \(\mathbb{G}\) possessing a null centroid.

In general, one can perform this redescription on any molecular set. It corresponds to obtaining a molecular polyhedron with a unique origin at the vector \(\left| {\mathbf{0}} \right\rangle\).

One can refer to this transformation as an origin shift of the initial molecular polyhedron described by the vector elements of the set \(\mathbb{D}\). Such a possibility allows to add of a new point to the already described set of conditions of QSPR data:

  1. (g)

    Knowing an isometric vector representation \(\mathbb{D}\) of a molecular set \(\mathbb{M}\) is the same as defining a molecular polyhedron. Then a centroid can be calculated. Hence, an origin shift can be performed on \(\mathbb{D}\), producing a new descriptor set, the origin-shifted molecular polyhedron \(\mathbb{G}\), bearing a zero centroid.

This M-dimensional possibility corresponds to the origin shift one can perform in one-dimensional scalar sets, as is usual in statistical lore, by subtracting the arithmetic mean from every set element.

6.2 Momenta of the molecular polyhedron

Once generating an origin-shifted molecular polyhedron \(\mathbb{G}\), one can use its elements to construct a set of statistical-like momenta, a set of vectors that condensate the information contained in the vector representation of the molecular set \(\mathbb{M}\).

One can describe the set of vector momenta as the average sum of the successive inward powers of the vectors in the origin-shifted polyhedron. One can define such inward powers as:

$$P \in {\mathbb{N}}:\left| {{\mathbf{m}}_{P} } \right\rangle = M^{ - 1} \Gamma_{M} \sum\limits_{I = 1}^{M} {\left| {{\mathbf{g}}_{I}^{\left[ P \right]} } \right\rangle } \equiv M^{ - 1} \Gamma_{M} \sum\limits_{I = 1}^{M} {\left( {\left| {{\mathbf{d}}_{I} } \right\rangle - \left| {\mathbf{c}} \right\rangle } \right)^{\left[ P \right]} } ,$$
(28)

remarking diagonal metric matrix presence associated with the QSPR problem.

In the Eq. (28), one can easily define by the following algorithm, see for example [89] for more details, the inward P-th power of a vectorFootnote 8:

$$\forall \left| {\mathbf{v}} \right\rangle = \left( {v_{1} ,v_{2} ,...v_{M} } \right)^{T} \in {\mathbb{V}}_{M} :\left| {{\mathbf{v}}^{\left[ P \right]} } \right\rangle = \left( {v_{1}^{P} ,v_{2}^{P} ,...v_{M}^{P} } \right)^{T} \in {\mathbb{V}}_{M} .$$
(29)

Owing to the vector power definition (29), the molecular polyhedron momenta, as defined in the Eq. (28), represent in the framework of vector spaces the same as the statistical momenta of a set of scalars. Thus, one can call the vectors described in the Eq. (28) the statistical-like vector momenta of the molecular polyhedron.

Then, the centroid corresponds to a first-order momentum: \(P = 1\), and becomes null for the origin-shifted molecular polyhedra. The vector momenta calculated with the values \(P = 2,3,4\) have the roles of variance, skewness, and kurtosis but in the framework of vector sets instead of scalar sets.

One can reduce the vector moments into scalars just by performing the complete sum of the elementsFootnote 9 of every momentum vector, as we can define, see for example [89], the complete sum of the elements of any vector as:

$$\forall \left| {\mathbf{v}} \right\rangle = \left( {v_{1} ,v_{2} ,...v_{M} } \right)^{T} \in {\mathbb{V}}_{M} :\left\langle {\left| {\mathbf{v}} \right\rangle } \right\rangle = \sum\limits_{I = 1}^{M} {v_{I} } .$$
(30)

Knowing that, by calculating the complete sum of a vector, also one can describe the contribution of every molecular structure to the total momentum. One can use the metric matrix as a vector submitted to a scalar product, that is:

$$\left\langle {\left| {{\mathbf{m}}_{P} } \right\rangle } \right\rangle = M^{ - 1} \sum\limits_{I = 1}^{M} {\left\langle {\Gamma_{M} \left| {{\mathbf{g}}_{I}^{\left[ P \right]} } \right\rangle } \right\rangle } = M^{ - 1} \sum\limits_{I = 1}^{M} {\left( {\sum\limits_{J = 1}^{M} {\Gamma_{M;JJ} g_{JI}^{P} } } \right)} = M^{ - 1} \sum\limits_{I = 1}^{M} {\gamma_{I}^{\left( P \right)} } .$$
(31)

Then, the set of elements:

$$\left\{ {\gamma_{I}^{\left( P \right)} = \sum\limits_{J = 1}^{M} {\Gamma_{M;JJ} g_{JI}^{P} } \left| {I = 1,M} \right.} \right\}$$
(32)

can be reordered, forming a column vector: \(\left| {{{\varvec{\upgamma}}}^{\left( P \right)} } \right\rangle = \left( {\gamma_{1}^{\left( P \right)} ,\gamma_{2}^{\left( P \right)} ,...\gamma_{M}^{\left( P \right)} } \right)^{T}\) which can be used as coordinates to locate the molecules of the set \(\mathbb{M}\) as a set of points in some Cartesian plot. For example, one can draw the molecular set’s elements: \(P = 2,3,4\) as a set of three-dimensional points: \(\left\{ {\gamma_{J}^{\left( 2 \right)} ,\gamma_{J}^{\left( 3 \right)} ,\gamma_{J}^{\left( 4 \right)} } \right\}\left( {J = 1,M} \right)\), representing every molecule in the set \(\mathbb{M}\).

Therefore, one can use origin-shifted molecular polyhedra momenta to characterize a molecular set and, in this manner, visualize a possible reordering within the molecular set \(\mathbb{M}\) by drawing each molecule as a point in a 1-, 2-, or 3-dimensional graph.

7 The QQSPR operator

From the first published applied quantum similarity developments, the possibility of defining a QQSPR operator has been repeated speculation; for example, see references [44, 45, 50, 60, 65]. Recently, one has discussed the possibility of defining some operator that, applied to the density function, provides an expectation value connected with a given property of interest. Some examples were given [78].

That is, one could construct some Hermitian operator \(\Omega \left( {\mathbf{r}} \right)\) in such a way that, using a quantum mechanical molecular electronic density function \(\rho \left( {\mathbf{r}} \right)\), acting as a distribution function, one can compute the expectation value of some property \(\pi\) through the integral:

$$\pi = \int_{D} {\Omega \left( {\mathbf{r}} \right)\rho \left( {\mathbf{r}} \right)} \, d{\mathbf{r}}.$$
(33)

When solving a QQSPR problem, as has been done at the beginning of the present work, recalling the nature, properties, and mathematical description of a molecular set \(\mathbb{M}\), one supposedly knows a set of electronic density functions: \({\mathbb{P}} \left\{ {\rho_{I} \left( {\mathbf{r}} \right)\left| {I = 1,M} \right.} \right\}\), associated in a one-to-one correspondence with the molecular set elements, that is: \({\mathbb{M}} \Leftrightarrow {\mathbb{P}}\).

7.1 Quantum similarity matrices

Quantum similarity permits the construction of the so-called similarity matrix via volume integrals, which in this case, one can generally write as:

$${\mathbf{Z}} = \left\{ {Z_{IJ} = \int_{D} {\int_{D} {\rho_{I} \left( {{\mathbf{r}}_{1} } \right)O\left( {{\mathbf{r}}_{1} ,{\mathbf{r}}_{2} } \right)\rho_{J} \left( {{\mathbf{r}}_{2} } \right)d{\mathbf{r}}_{1} d{\mathbf{r}}_{2} \, \left| {I,J = 1,M} \right.} } } \right\},$$
(34)

where one can choose the weight operator \(O\left( {{\mathbf{r}}_{1} ,{\mathbf{r}}_{2} } \right)\) positive definite. Usually, one uses Dirac’s function \(\delta \left( {{\mathbf{r}}_{1} - {\mathbf{r}}_{2} } \right)\), which transforms the integral in the Eq. (34) into a so-called quantum overlap similarity matrix:

$$\int_{D} {\int_{D} {\rho_{I} \left( {{\mathbf{r}}_{1} } \right)\delta \left( {{\mathbf{r}}_{1} - {\mathbf{r}}_{2} } \right)\rho_{J} \left( {{\mathbf{r}}_{2} } \right)d{\mathbf{r}}_{1} d{\mathbf{r}}_{2} { = }\int_{D} {\rho_{I} \left( {\mathbf{r}} \right)} \rho_{J} \left( {\mathbf{r}} \right)d{\mathbf{r}}.} }$$
(35)

Thus, one habitually computes the quantum similarity matrix as the integral array:

$${\mathbf{Z}} = \left\{ {Z_{IJ} = \int_{D} {\rho_{I} \left( {\mathbf{r}} \right)\rho_{J} \left( {\mathbf{r}} \right)d{\mathbf{r}} = Z_{JI} \left| {I,J = 1,M} \right.} } \right\},$$
(36)

an expression formally connected with the discrete construction of the Eq. (10). The manipulation described in Sect. 5 is applicable because Z is a symmetric \(\left( {M \times M} \right)\) matrix. Then a discrete isometric vector set of quantum mechanical origin can be obtained, as described in Sect. 5.1, connecting in this way CQSPR with QQSPR.

One computes the integrals entering the Eq. (36) by superimposing (rotating and translating) the attached different molecular pairs \(\left\{ {m_{I} ,m_{J} } \right\}\),Footnote 10 until each integral becomes maximal; for example, see reference [103].

This manipulation of the molecular pairs produces overlap quantum similarity matrices that are not definite, leading to the presence of a metric signature matrix \(\Gamma_{M}\), as defined in the Eq. (21), in the posterior use of the isometric vector set \(\mathbb{D}\).

7.2 The QQSPR operator

To solve any QQSPR problem, it is not sufficient to construct the quantum similarity matrix. One must use the Eq. (33) to build an algorithm to evaluate the unknown property values of some molecular elements present in a molecular set.

A possible way to obtain an adequate QQSPR algorithm corresponds to constructing the QQSPR operator as a Taylor-like series [110] inspired by some old Lowdin’s work [104,105,106], with each molecular density function acting as a variable. Assuming this option, we can write:

$$\forall I = 1,M:\Omega \left( {\mathbf{r}} \right) \equiv \Omega \left( {\rho_{I} \left( {\mathbf{r}} \right)} \right) = \sum\limits_{P = 0}^{\infty } {\omega_{P} \left[ {\rho_{I} \left( {\mathbf{r}} \right)} \right]^{P} } ,$$
(37)

the coefficient set \(\left\{ {\omega_{P} \left| {P = 0,\infty } \right.} \right\}\), considered constant among the whole molecular set, characterizes the operator (37) for a given problem as a set of expectation values and is attached to the powers of any density function adequate for the operator and associated with each molecule in the set \(\mathbb{M}\).

The Eq. (37) represents a general Taylor-like expression of the function representing the sought QQSPR operator; therefore, one can explicitly write the inverses of the factorials, substituting the coefficient set \(\left\{ {\omega_{P} \left| {P = 0,\infty } \right.} \right\}\) by the set:

$$\forall P = 0,\infty :\omega_{P} = \frac{{w_{P} }}{P!},$$
(38)

however, this possibility will not be explicitly used in this discussion. Yet, it might be advantageous to scale Taylor-like series coefficients in practical computations.

The Eq. (33) now can be written for a specific molecular structure \(m_{I}\) using the Eq. (37), yielding:

$$\forall I = 1,M:\pi_{I} = \sum\limits_{P = 0}^{\infty } {\omega_{P} \int_{D} {\left[ {\rho_{I} \left( {\mathbf{r}} \right)} \right]^{P + 1} } } d{\mathbf{r}},$$
(39)

a result that demands computation of the density function powers up to some value. Estimating some molecular property needs the calculation of self-similarities up to some chosen order Q + 1, truncating the Taylor series in this way (37).

One must note here that the expectation values set: \({{\varvec{\Pi}}} = \left\{ {\pi_{I} \left| {I = 1,M} \right.} \right\}\), are not the same as the initial property values set P described in point (b) of Sect. 2.

As explained before, the integrals shown in the expression (39) do not present the superposition problem when calculated as in the overlap quantum similarity matrix. Because all of them belong to the same molecular structure, as density function powers grew, integral computation times might present a problem, even with simplified densities and modern computer facilities.

7.3 QQSPR equations

The following array can contain the integrals described in the Eq. (39):

$${\mathbf{Q}} = \left\{ {\int_{D} {\left[ {\rho_{I} \left( {\mathbf{r}} \right)} \right]^{P + 1} } d{\mathbf{r}} = Q_{IP} \left| {I = 1,M \wedge P \in {\mathbb{N}}} \right.} \right\},$$
(40)

so using the vectors: \(\left| {{\varvec{\uppi}}} \right\rangle = \left( {\pi_{1} ,\pi_{2} ,...\pi_{M} } \right)^{T}\) and \(\left| {{\varvec{\upomega}}} \right\rangle = \left( {\omega_{0} ,\omega_{1} ,...\omega_{P} ,...} \right)^{T}\), one can compactly rewrite the Eq. (39) as:

$${\mathbf{Q}}\left| {{\varvec{\upomega}}} \right\rangle = \left| {{\varvec{\uppi}}} \right\rangle .$$
(41)

One can solve the Eq. (41) by constructing a pseudoinverse matrix. Just multiply the Eq. (41) on the left by \({\mathbf{Q}}^{T}\), obtaining the expression:

$${\mathbf{Q}}^{T} {\mathbf{Q}}\left| {{\varvec{\upomega}}} \right\rangle = {\mathbf{Q}}^{T} \left| {{\varvec{\uppi}}} \right\rangle \Rightarrow \left| {{\varvec{\upomega}}} \right\rangle = \left( {{\mathbf{Q}}^{T} {\mathbf{Q}}} \right)^{ - 1} {\mathbf{Q}}^{T} \left| {{\varvec{\uppi}}} \right\rangle .$$
(42)

That is a matrix expression permitting to compute the coefficient vector \(\left| {{\varvec{\upomega}}} \right\rangle ,\) which allows the construction of the QQSPR operator.

7.4 Practical use of the QQSPR equation

There are two aspects of the QQSPR Eq. (39) to obtain unknown values from the density function and the QQSPR operator. Equation (42) permits to obtain the coefficients defining the operator up to some approximation, for instance: \(\left\{ {P = 0,Q} \right\}\), by using the elements of the matrix Q obtained by the Eq. (40) and some known values of the property.

7.4.1 Approximate matrix Q

The first aspect one has to consider corresponds to how to plausibly compute the elements of the matrix Q. One will study it here in some detail.

One can easily construct the matrix Q with the aid of ASA approximate electronic density functions [107, 108] and an extension of the Gaussian product theorem, published some years ago [109].

But a better way to overcome the large amount of computation needed to implement the Eq. (40), corresponds to using the set of isometric vectors, as previously defined in Sects. 5.1 and 6.1.

One can use both kinds of vectors. In case one chooses the raw isometric vector set \(\mathbb{D}\) as a finite-dimensional vector set representing the molecular set, then one can rewrite the Eq. (40) as:

$$\forall \left| {{\mathbf{d}}_{I} } \right\rangle \in {\mathbb{D}}:{\mathbf{Q}} = \left\{ {\left\langle {\Gamma_{M} \left| {{\mathbf{d}}_{I}^{{\left[ {P + 1} \right]}} } \right\rangle } \right\rangle = Q_{IP} \left| {I = 1,M \wedge P \in {\mathbb{N}}} \right.} \right\}.$$
(43)

It is also considering the Eq. (43) that the inward power of the isometric vectors must bear the signature metric matrix.

One can obtain a similar expression with the vertices of the molecular polyhedron \(\mathbb{G}\), which, as earlier described, are the isometric vectors of the set \(\mathbb{D}\) origin-shifted by the centroid. Because this substitution is trivial, it will no longer be mentioned in the following, but one has to keep in mind both possible alternative uses of the sets \(\mathbb{D}\) or \(\mathbb{G}\).

Obtaining an approximate Q matrix via Eq. (43) permits finding the QQSPR operator coefficients in the same way as in the exact solution (42).

One has to account for another parameter: the limit that has to be reached in the series leading to the QQSPR operator. We can admit that for a given problem, the maximal value of the power is \(\max \left( P \right) = Q\), and then construct the matrices and vectors accordingly. From now on, this limit will be used in the appropriate places of the equations bearing it.

7.4.2 Use of the QQSPR operator to estimate unknown values of some property

The second aspect one has to be aware of corresponds to applying the QQSPR operator over the density of a molecule with an unknown property to evaluate it. One has already studied and published [SS] various facets of this problem, contemplating variants and the general case. Here, one will only sketch and circumscribe it to the case where one evaluates a unique molecular property.

The preliminary step corresponds to suppose that to the molecular set M of known property values, one adds a new structure with an unknown property value: \(\pi_{u}\). The problem can be analyzed considering that now one deals with \(M + 1\) molecular structures and that one has to dimension the matrix Q accordingly, having in this manner the structure:

$${\mathbf{Q}}_{a} = \left( {\begin{array}{*{20}c} {\mathbf{Q}} \\ {\left\langle {{\mathbf{q}}_{u} } \right|} \\ \end{array} } \right),$$
(44)

where the row vector \(\left\langle {{\mathbf{q}}_{u} } \right|\) now contains the new elements involving the added molecule, that is:

$$\left\langle {{\mathbf{q}}_{u} } \right| = \left\{ {\int_{D} {\left[ {\rho_{u} \left( {\mathbf{r}} \right)} \right]^{P + 1} } d{\mathbf{r}} = q_{uP} \left| {P = 0,Q} \right.} \right\},$$
(45)

or using the isometric vectors:

$$\left\langle {{\mathbf{q}}_{u} } \right| = \left\{ {\left\langle {\left\langle {{\mathbf{d}}_{u}^{{\left[ {P + 1} \right]}} } \right|\Gamma_{M} } \right\rangle = q_{uP} \left| {P = 0,Q} \right.} \right\}.$$
(46)

Then one can rewrite the Eq. (41) as follows:

$$\left( {\begin{array}{*{20}c} {\mathbf{Q}} \\ {\left\langle {{\mathbf{q}}_{u} } \right|} \\ \end{array} } \right)\left| {{\varvec{\upomega}}} \right\rangle = \left( {\begin{array}{*{20}c} {\left| {{\varvec{\uppi}}} \right\rangle } \\ {\pi_{u} } \\ \end{array} } \right),$$
(47)

and now: \(\left\{ {\left| {{\varvec{\upomega}}} \right\rangle ;\pi_{u} } \right\}\) are the unknowns to be found.

One can easily prepare an iterative procedure in this case. Supposing that the order of the density power series is Q, then starting with an approximate value of the unknown property, using the Q-dimensional coefficients calculated without the unknown property molecule, say \(\left| {{{\varvec{\upomega}}}^{0} } \right\rangle\):

$$\pi_{u} \approx \sum\limits_{P = 0}^{Q} {\omega_{P}^{0} \int_{D} {\left[ {\rho_{u} \left( {\mathbf{r}} \right)} \right]^{P + 1} } } d{\mathbf{r}},$$
(48)

or

$$\pi_{u} \approx \sum\limits_{P = 0}^{Q} {\omega_{P}^{0} \left\langle {\Gamma_{M} \left| {{\mathbf{d}}_{u}^{{\left[ {P + 1} \right]}} } \right\rangle } \right\rangle } .$$
(49)

Using the approximate property value obtained by the Eqs. (48) or (49) in the Eq. (47), then one can solve the Eq. (47) to obtain new component values for the vector \(\left| {{\varvec{\upomega}}} \right\rangle\):

$$\begin{gathered} \left( {\begin{array}{*{20}c} {{\mathbf{Q}}^{T} } & {\left| {{\mathbf{q}}_{u} } \right\rangle } \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\mathbf{Q}} \\ {\left\langle {{\mathbf{q}}_{u} } \right|} \\ \end{array} } \right)\left| {{\varvec{\upomega}}} \right\rangle = \left( {\begin{array}{*{20}c} {{\mathbf{Q}}^{T} } & {\left| {{\mathbf{q}}_{u} } \right\rangle } \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\left| {{\varvec{\uppi}}} \right\rangle } \\ {\pi_{u} } \\ \end{array} } \right) \Rightarrow \hfill \\ \left( {{\mathbf{Q}}^{T} {\mathbf{Q}} + \left| {{\mathbf{q}}_{u} } \right\rangle \left\langle {{\mathbf{q}}_{u} } \right|} \right)\left| {{\varvec{\upomega}}} \right\rangle = \left( {{\mathbf{Q}}^{T} \left| {{\varvec{\uppi}}} \right\rangle + \pi_{u} \left| {{\mathbf{q}}_{u} } \right\rangle } \right) \Rightarrow \hfill \\ \left| {{\varvec{\upomega}}} \right\rangle = \left( {{\mathbf{Q}}^{T} {\mathbf{Q}} + \left| {{\mathbf{q}}_{u} } \right\rangle \left\langle {{\mathbf{q}}_{u} } \right|} \right)^{ - 1} \left( {{\mathbf{Q}}^{T} \left| {{\varvec{\uppi}}} \right\rangle + \pi_{u} \left| {{\mathbf{q}}_{u} } \right\rangle } \right) \hfill \\ \end{gathered}$$
(50)

Note that the dimension of the coefficient vector doesn’t change if one keeps the approximation order of Eqs. (39) or (43) to a constant value Q.

However, keeping the order Q constant is not compulsory, so the approximation order can be considered free to vary within the iteration cycles.

Now one can obtain a restored value of the unknown property using alternatively up to convergence Eq. (50) while using the refreshed coefficient vector \(\left| {{\varvec{\upomega}}} \right\rangle\) in the corresponding equation:

$$\pi_{u} \approx \sum\limits_{P = 0}^{Q} {\omega_{P} \int_{D} {\left[ {\rho_{u} \left( {\mathbf{r}} \right)} \right]^{P + 1} } } d{\mathbf{r}},$$
(51)

or

$$\pi_{u} \approx \sum\limits_{P = 0}^{Q} {\omega_{P} \left\langle {\Gamma_{M} \left| {{\mathbf{d}}_{u}^{{\left[ {P + 1} \right]}} } \right\rangle } \right\rangle } .$$
(52)

The iteration process can stop when values of the variation of the estimated unknown property in two successive iterations appear below a given threshold.

One can easily generalize the procedure when the unknown property is associated with several newly added molecular structures; for more details, see reference [103].

8 Conclusions

One has discussed a complete review of the mathematical structure and algorithms used to solve QSPR problems in molecular spaces.

Along the development of this study, one can observe that classical or quantum procedures can be brought to the same footing and associated with the isometric description of the molecular vectors representing the elements of a molecular set.

One has described in the present study the whole algorithmic structure of molecular spaces QSPR. One can resume it in seven points enumerated with the seven first letters of the alphabet within the text: [(a), (b), (c), (d), (e), (f), (g)}. Such seven points summarize the body of molecular space QSPR procedures.

They can be summarized as follows:

  • One knows a molecular set associated with some property.

  • One can construct an appropriate dimension molecular space.

  • One can compute molecular selfsimilarities to reorder the molecular data.

  • One can assemble a molecular descriptor-independent vector set in molecular space.

  • One can build an isometric vector set.

  • One can construct a molecular polyhedron.

Finally, with the data resulting from the previous manipulations, one can set up the algorithm of QSPR in molecular space.

In future work, one will present applications of the various proposed algorithms.