1 Introduction

The pairwise comparisons (PC) method was introduced in its early form by Fechner (1966), then it was popularized and developed by Thurstone (1994). The introduction of hierarchical structures by Saaty (2008) was another important contribution to the PC method, providing the methodology and practical ways to deal with the large amounts of criterion parameters. Initially the PC method was used in the scientific study of psychometrics and psychophysic (Thurstone 1994), however, it then came to be used in other areas of applications, such as complex decision theory (Saaty 2008), economics (Peterson and Brown 1998), voting systems (Tideman 1987) and others. Since the data, which are input to the PC method are the result of human judgment, it is very easy for inaccuracy to occur. Hence, the input data set is frequently ambiguous and does not allow users to draw firm conclusions. There are several indexes of the data inconsistency (Bozóki and Rapcsak 2008), including the best known Saaty’s eigenvector method (Saaty 1980), Least Squares Method, Chi Squares Method, Koczkodaj’s distance based inconsistency index (Koczkodaj 1993), and others. Using these indexes, the reliability of the data can be assessed, hence, the confidence in the result can be evaluated. The answer to the question of how much input data must be consistent to ensure the result reliability is the subject of empirical research. For instance, according to Saaty’s recommendation every occurrence of the consistency ratio greater than or equal to \(0.1\) should be the subject of re-examination of the pairwise judgments until the inconsistency becomes less than or equal to the desired value (Bozóki and Rapcsak 2008; Triantaphyllou et al. 1990). The main criticism of this approach relates to its separation from the data and lack of localizing the most problematic matrix elements (Bozóki and Rapcsak 2008; Koczkodaj 1993; Triantaphyllou et al. 1990). In contrast, Koczkodaj’s inconsistency index has a meaningful interpretation and provides information about the inconsistency location, but it does not provide an exact answer to the question of how good the average data sample is. Inconsistency identified as too high, must be reduced to an acceptable level (ideally to zero). Since the ratio coefficients, which are the input to the PC method, frequently represent experts’ judgements, thus a natural way of inconsistency reduction is to call the expert panel once again and ask the professionals gathered to agree on the opinion (Gomes 1993). Because such a solution is usually time-consuming and expensive, heuristic algorithms of inconsistency reduction have been proposed (Koczkodaj and Orłowski 1999; Koczkodaj and Szarek 2010; Gomes 1993; Xu and Wei 1999; Bozóki 2008; Temesi 2006; Cao et al. 2008). The result of these algorithms is a new set of data, which preserves most of the decision maker’s original judgment structure and significantly reduces the data inconsistency.

The proposed innovative solution approaches the problem differently. It does not attempt to minimize inconsistency in the data, but rather proposes a way of using the data, which takes into account their inconsistency. Hence, knowing the exact values for a few conceptsFootnote 1 and some inconsistent set of ratios between them, the data analyst is able to estimate values of all other concepts with errors depending on the degree of data inconsistency. The presented approach is comparable to the inconsistency reduction methods mentioned above, since the set of concepts for which the estimates are known can be easily transformed into a consistent set of data (as addressed in Sect. 8).

The first and second sections of the article focus on the presentation of the necessary facts and definitions concerning the pairwise comparisons methods. Section three formulates formally the problem considered in the work. The fourth and fifth, sections provide the Heuristic Rating Estimation (HRE) algorithm together with a numerical example demonstrating the algorithm application in practice. The sixth section shows the relationship between errors of estimations obtained by using the HRE algorithm and the data inconsistency. The next, seventh section addresses the case of which the subsequent estimation sets converge to some limit. Finally, the last two sections contain the closing discussion and summary.

2 A pairwise comparisons method

A crucial part of the PC method is \(M=(m_{ij})\wedge m_{i,j}\in \mathbb R _{+}\wedge i,j\in \{1,\ldots ,n\}\) a PC matrix that expresses some quantitative relation \(R\) over the finite set of concepts \(C\overset{df}{=}\{c_{i}\in \fancyscript{C}\wedge i\in \{1,\ldots ,n\}\}\) where \(\fancyscript{C}\) is a non empty universe of concepts and \(R(c_{i},c_{j})=m_{ij},\, R(c_{j},c_{i})=m_{ji}\). Traditionally, these concepts are interpreted as subjective stimuli (Thurstone 1994), whilst the values \(m_{ij}\) and \(m_{ji}\) are considered as the relative importance indicators (stimulus intensities), so that according to the best knowledge of an expert the significance of \(c_{i}\) equals \(m_{ij}c_{j}\).

Definition 1

A matrix \(M\) is said to be reciprocal if \(\forall i,j\in \{1,\ldots ,n\}:m_{ij}=\frac{1}{m_{ji}}\), and \(M\) is said to be consistent if \(\forall i,j,k\in \{1,\ldots ,n\}:m_{ij}\cdot m_{jk}\cdot m_{ki}=1\).

Since the knowledge stored in the PC matrix usually comes from different professionals in the field of relation \(R\), it often results in inaccuracy. In such a case, reasoning using the data gathered in \(M\) may give ambiguous results. This observation gave rise to the research on the concept of data consistency. In the ideal case, \(M\) is consistent, and there is no doubt as regards the value assigned to the concept \(c_{j}\) if the value assigned to \(c_{i}\) and \(m_{ji}\) is known. Unfortunately, in practice the knowledge in \(M\) is inconsistent, and professionals using the PC method have to deal with this incoherence. Thus, it is important to answer the question of how inconsistent the knowledge in the PC matrix is. There are a number of inconsistency indexes, including Eigenvecor Method (Saaty 2008), Least Squares Method, Chi Squares Method (Bozóki and Rapcsak 2008), Koczkodaj’s distance based inconsistency index (Koczkodaj 1993) and others. For the purpose of this article, the Koczkodaj’s distance based inconsistency index has been adopted since it is the only localizing index amongst the above mentioned indexes.

Definition 2

Koczkodaj’s distance based inconsistency index \(\fancyscript{K}\) of \(n\times n\) and \((n>2)\) reciprocal matrix \(M\) is equal to

$$\begin{aligned} \fancyscript{K}(M)=max\left\{ min\left\{ \left| 1-\frac{m_{ij}}{m_{ik}m_{kj}}\right| ,\left| 1-\frac{m_{ik}m_{kj}}{m_{ij}}\right| \right\} \right\} \end{aligned}$$
(1)

where \(i,j,k=1,\ldots ,n\) and \(i\ne j\wedge j\ne k\wedge i\ne k\).

Intuitively speaking, since in an “ideal” matrix \(\forall i,j,k\in \{1,\ldots ,n\}:m_{ij}\cdot m_{jk}\cdot m_{ki}=1\) the Koczkodaj’s index localizes the worse triad (by the Euclidean distance) from this ideal.

PC matrices may be over-complete, i.e. there is more than one estimation describing one and the same relation between \(v_{i}\) and \(v_{j}\), but they can be also incomplete, i.e. not all values \(m_{ji}\) are defined. While the first situation can be addressed in many ways, for instance the estimates related to the same pair of concepts can be averaged, the other is not desirable. It indicates that the model is lacking of knowledge, although the missing estimates can be somehow compensated (for example, by employing the properties of reciprocity, consistency or transitivity) (Koczkodaj et al. 1999). Hence, for further consideration, it will be assumed that the \(n\times n\) PC matrix \(M\) is complete in the sense specified below.

Definition 3

A matrix \(M\) \((n\times n)\) is said to be complete if \(\forall i,j=1,\ldots ,n:m_{ij}\) is defined.

Due to the further consideration it is useful to define a graph structure over the set \(C\) and matrix \(\mathsf M \).

Definition 4

Let a pairwise comparisons graph \(G=(C,E,M)\) be a weighted directed graph over the matrix \(\mathsf M \), where

  • \(C\), is a set of vertices

  • \(E\subseteq C\times C\) so that \((v_{i},v_{j})=e\in E\Leftrightarrow m_{ji}\) exists, is a set of edges

  • \(M:E\rightarrow \mathbb R \) so that \(M(e)\overset{df}{=}m_{ji}\wedge e=(v_{i},v_{j})\) is a function of experts’ assessments

Wherever it does not raise doubts, instead of \(M(e)\wedge e=(u,v)\) the function \(M\) will be written with two arguments i.e. \(M(u,v)\).

3 Problem formulation

Since the concepts are linked to each other by a quantitative relation \(R\), then assuming that the exact values of some concepts are known, the values of the others should be proposed. Thus, let \(\mu :C\nrightarrow \mathbb R _{+}\) be a partial function that assigns to some concepts from \(C\subset \fancyscript{C}\) positive values from \(\mathbb R _{+}\). Hence, the concepts for which the actual value \(\mu \) is known are denoted by \(C_{K}\subset C\) and called known concepts, whilst concepts for which \(\mu \) need to be determined are denoted \(C_{U}=C\backslash C_{K}\) and called unknown concepts. The relation between different concepts in terms of the function \(\mu \) is represented in the form of the PC matrix \(M\), so that \(m_{ji}\mu (v_{i})=\mu (v_{j})\). Since \(m_{ji}\) usually aims to express how much greater or smaller \(v_{i}\) is than \(v_{j}\) with respect to \(\mu \) by convention it is assumed that elements of the \(n\times n\) PC matrix are positive real numbers, i.e. \(m_{ji}\in \mathbb R _{+}\) where \(i,j\in 1,\ldots ,n\).

The presented method aims to provide an iterative heuristic estimation algorithm that for all \(v\in C_{U}\) proposes the appropriate value of \(\mu (v)\). In this approach \(m_{ji}\mu (v_{i})\) is treated as a sample of \(\mu (v_{j})\), hence the expected value of \(\mu (v_{j})\) is the arithmetic mean of values \(m_{ji}\mu (v_{i})\). The algorithm is iterative and sets the new expected values based on the ones previously determined. It stops either when it reaches a fixed number of iterations or (if convergent) when calculations reach the desired accuracy. Although during the course of the algorithm the new values for \(\mu (v)\) and \(v\in C_{U}\) are calculated, \(M\) remains unchanged and serves as reference data.

figure a

4 Heuristic rating estimation algorithm

The principle of operation of the rating estimation algorithm (Listing 1) is to iteratively assign the value \(\mu (u)\) to every unknown vertex \(u\in C_{U}\) by calculating the mean of its samples, and then choosing from among all the calculated estimations the one which is optimal. The idea of the procedure comes from the BFS algorithm (Cormen et al. 2009), which, layer by layer, traverses the graph of interest. In the presented approach, the estimates for the next layer are computed on the base of the previous layer, assuming that vertex repetition in different steps is allowed. The algorithm stops when either the appropriate layer is reached or if the algorithm converges for the given \(G\), then it stops when the distance in the sense of the chosen metrics (see Eq. 47) between the subsequent estimates for elements in \(C_{U}\) is smaller than some desired \(\epsilon \). If the algorithm does not converge (when \(G\) is fixed) then the number of steps that need to be made is explicitly set at the beginning of the estimation procedure. In the case of standard complete PC matrices the graph is a directed clique, where every single vertex is connected to each other. In such a case it appears that even in the first step all the vertices are visited, whilst in the second step the computed estimations may take into account all possible valuesFootnote 2 gathered in \(M\) except those describing ratios between elements in \(C_{K}\). Thus, when the algorithm is not convergent, a few iterations of the HRE procedure seem to be a reasonable choice.

The main principle of the algorithm (Listing 1) seems to be quite straightforward. It starts from assigning followers (in sense \(E\)) of all elements of \(C_{K}\) to the set \(L\) (Listing: 1, line: 2). For each \(u\), for which \(\mu (u)\) is unknown, all its predecessors in \(E\) are scanned. If \(\mu (v)\) is already known then \(v\) becomes an element taken into account during the mean computation (Listing: 1, line: 10). Then, within the two loops while the current level is traversed (the outer loop Listing: 1, lines: 620), and appropriate estimates are computed (the inner loop Listing: 1, lines: 812). The predicate stop becomes false when either the auxiliary variable \(l\) reaches the assumed number of levels or it holds that \(l>1\) and \(\rho (x_{l},x_{l+1})<\epsilon \), where \(x_{l}=(\mu _{old}(c_{u_{1}}),\ldots ,\mu _{old}(c_{u_{k}}))\) and \(x_{l+1}=(\mu (c_{u_{1}}),\ldots ,\mu (c_{u_{k}}))\) where the concepts \(c_{u_{i}}\in C_{U}\) and \(\rho \) is one of the metrics defined in (Eq. 47). Otherwise it is true, and the outer loop continues traversing \(G\). As long as \(L\) is not empty the inner loop proceeds as follows: removes one element from the set \(L\) and assigns it to the variable \(u\) (Listing: 1, line: 9), forms the auxiliary set \(T\) containing input edges for \(u\) so that their beginnings are predecessors of \(u\) in \(E\) for which \(\mu \) is already known, and for all \(u\in L\) adds to the auxiliary mapping \(tmp:C_{U}\rightarrow \mathbb R _{+}\) a pair \((u,\mu (v)M(v,u))\) (Listing: 1, line: 11). When all the elements of \(L\) are processed then the inner loop ends. Next the outer loop rewrites the auxiliary mapping \(tmp\) to the result map \(\mu \) (Listing: 1, lines: 1314). Since for all the concepts that were previously in \(L\) the function \(\mu \) is known, the next level is devoted to \(\mu \) calculation for their followers in \(L\), hence the set \(L\) is filled back (Listing: 1, lines: 1516). At the end of the outer loop the step counter is incremented and the auxiliary mapping \(tmp\) is emptied. When the outer loop completes its operation, the set \(Est\) contains a sequence of subsequent sets of estimates for concepts from \(C_{U}\). Then, at the end of the procedure (Listing: 1, line: 21) an optimal set of estimates needs to be chosen. For the purposes of this work it was assumed that the optimal set of estimates is one for which the average of the absolute mean estimation errors is minimal (Eq. 4). Let us define the absolute mean error \(e(u)\) as an error indicator for some concept \(u\) and mapping \(\mu \) as follows:

$$\begin{aligned} e_{\mu }(u)=\frac{1}{n}\overset{n}{\underset{i=0}{\sum }}\left| \mu (u)-\mu (v_{i})\cdot M(v_{i},u)\right| \end{aligned}$$
(2)

where \(u\in C_{U}\), \((v_{i},u)\in E\) and \(\mu (v_{i})\) is already defined. Then the average error for all unknown concepts with respect to \(\mu \) is defined as:

$$\begin{aligned} e_{\mu }(C_{U})=\frac{1}{\#C_{U}}\underset{c\in C_{U}}{\sum }e_{\mu }(c) \end{aligned}$$
(3)

hence

$$\begin{aligned} \text{ chose }\_\text{ optimal }\_\text{ est }(Est)=\underset{\mu \in Est}{min}\left\{ e_{\mu }(C_{U})\right\} \end{aligned}$$
(4)

In every iteration of the algorithm every weight \(M(v,u)\) (Listing: 1, line: 11) is considered exactly once. Hence, the running time of the outer loop is \(O(r\cdot |E|)=O(r\cdot n^{2})\) where \(r\) is the number of iterations, and \(n\) is the size of \(C\). Similarly, the calculation of the average error (Eq. 3) requires the consideration of every \(M(v,u)\) exactly once, hence computing formulae (Eq. 4) also requires at most \(r\cdot n^{2}\) steps. Thus, the overall running time of the HRE algorithm is also \(O(r\cdot n^{2})\).

5 Numerical example

Let us illustrate the algorithm defined above by a simple numerical example. Some state agency supports five innovative projects \(u_{1},\ldots ,u_{5}\). After a while the two of them \(u_{4}\) and \(u_{5}\) come to an end and their actual cost becomes known \(\mu (u_{4})=6,\, \mu (u_{5})=2\). As both projects exceeded the initial budget, the agency wants to reestimate the expected costs of the other projects using the already acquired knowledge. For this purpose the agency organizes a panel of experts under which a PC matrix \(M\) is formed (Eq. 5) reflecting the predicted cost relations between all the five projects.

$$\begin{aligned} \mathsf M =\left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} 1 &{} \frac{3}{5} &{} \frac{3}{4} &{} \frac{1}{2} &{} \frac{4}{3}\\ \frac{5}{4} &{} 1 &{} \frac{5}{4} &{} \frac{11}{12} &{} \frac{5}{2}\\ \frac{6}{5} &{} \frac{4}{5} &{} 1 &{} \frac{1}{2} &{} 2\\ \frac{3}{2} &{} \frac{6}{5} &{} \frac{6}{5} &{} 1 &{} 3\\ \frac{2}{3} &{} \frac{3}{5} &{} \frac{4}{7} &{} \frac{1}{3} &{} 1 \end{array}\right] \end{aligned}$$
(5)

Since experts were assigned to the pairs \((u_{i},u_{j})\) randomly and did not know each other’s estimates, the matrix \(\mathsf M \) is neither reciprocal nor consistent. Based on this data the agency must calculate adequate estimates of the projects \(u_{1},u_{2}\) and \(u_{3}\) and take appropriate decisions regarding the future of these projects. The reasoning presented below shows how to calculate the estimated cost of the projects using a HRE procedure (Listing: 1).

The matrix \(M\) is used for the generation graph \(G\), the set of known concepts \(C_{K}\) is formed by the projects \(u_{4}\) and \(u_{5}\), and finally the unknown concept set is \(C_{U}=\{u_{1},u_{2},u_{3}\}\). Since all concepts are reachable in the first step, and during the second step the \(\mu \) mapping is computed for every \(c\in C_{U}\) using all other concepts in \(C\), then for the practical demonstration of the algorithm the number of levels traversed is limited to \(2\). In fact, for the matrix \(M\) as given (Eq. 5) the subsequent estimations calculated by HRE converge. Thus, instead of limiting the number of steps, an appropriately small \(\epsilon \) can be chosen. This case will be discussed later, after the convergence criterion for HRE is defined.

Let us assume that the first vertex considered during the first step of the algorithm is \(u_{1}\). Then, according to the presented procedure \(\mu (u_{1})\) is computed as follows:

$$\begin{aligned} \mu (u_{1})=\frac{1}{2}\cdot \left( \mu (u_{4})\cdot \mathsf M (\mathsf u _{4},\mathsf u _{1}) +\mu (u_{5})\cdot \mathsf M (\mathsf u _{5},\mathsf u _{1})\right) \end{aligned}$$
(6)

hence,

$$\begin{aligned} \mu (u_{1})=\frac{1}{2}\cdot \left( 6\cdot \frac{1}{2}\cdot +2\cdot \frac{4}{3}\right) =\frac{17}{6}\approx 2.83 \end{aligned}$$
(7)

and further, in the same way:

$$\begin{aligned} \mu (u_{2})&= \frac{1}{2}\cdot \left( 6\cdot \frac{11}{12} +2\cdot \frac{5}{2}\right) =\frac{21}{4}=5.25\end{aligned}$$
(8)
$$\begin{aligned} \mu (u_{3})&= \frac{1}{2}\cdot \left( 6\cdot \frac{1}{2} +2\cdot 2\right) =\frac{7}{2}=3.5 \end{aligned}$$
(9)

The first turn of the outer loop (traversing the first level of \(G\)) brings estimations for all vertices in \(C_{U}\). Thus, although the estimation process can be terminated, the agency concludes that it is better to take into account more data (for instance, due to the excluding of accidental errors of some experts), then decides to perform the second iteration of the algorithm. The new subsequent values \(\mu (u_{1}),\mu (u_{2})\) and \(\mu (u_{3})\) are calculated below.

$$\begin{aligned} \mu (u_{1})&= \frac{1}{4}\cdot \left( \frac{21}{4}\cdot \frac{3}{5} +\frac{7}{2}\cdot \frac{3}{4}+6\cdot \frac{1}{2}\cdot \frac{1}{4} +2\cdot \frac{4}{3}\right) =\frac{1373}{480}\approx 2.86\end{aligned}$$
(10)
$$\begin{aligned} \mu (u_{2})&= \frac{1}{4}\cdot \left( \frac{17}{6}\cdot \frac{5}{4} +\frac{7}{2}\cdot \frac{5}{4}+6\cdot \frac{11}{12} +2\cdot \frac{5}{2}\right) =\frac{221}{48}\approx 4.6\end{aligned}$$
(11)
$$\begin{aligned} \mu (u_{3})&= \frac{1}{4}\cdot \left( \frac{17}{6}\cdot \frac{6}{5} +\frac{21}{4}\cdot \frac{4}{5}+6\cdot \frac{1}{2}+2\cdot 2\right) = \frac{73}{20}\approx 3.65 \end{aligned}$$
(12)

Recognizing the achieved result as optimal, the agency finishes the algorithm. It is worth noting that in the matrix \(M\) (Eq. 5) the two estimates do not come from the experts and have been introduced just for the matrix completeness. Namely, since the values \(\mu (v_{4})\) and \(\mu (v_{5})\) were previously known, the values \(m_{4,5}\) and \(m_{5,4}\) were calculated as ratios and respectively.

6 Data consistency and estimation error

For any algorithm, the result of which being inaccurate, a natural questions concern the accuracy of the resulting solution, and what should be done to improve the accuracy. The presented procedure is based on the fundamental sample mean estimation equation (Walpole 2012), where for the purpose of the algorithm the product \(\mu (u_{i})\cdot M(u_{i},u_{j})\) is treated as a sample, whilst \(\mu (u_{j})\) denotes the expected value inferred from samples. In this case the natural measure of the algorithm output accuracy is an estimation error understood as the distance between sample and the mean. Of course, the smaller the error, the more accurate the result. According to the popular adage “garbage in, garbage out” it is expected that even the best algorithm is not able to provide good output if the input data are bad. Hence, it might be expected that the estimation errors of HRE depend on data consistency, and are smaller in correlation with data inconsistency. The reasoning below supports this assertion.

Theorem 1

For a complete PC matrix \(M\), and PC graph \(G\) over \(M\) it holds that

$$\begin{aligned} \fancyscript{K}(M)\rightarrow 0\Rightarrow e_{\mu }(u)\rightarrow 0 \end{aligned}$$
(13)

where \(\fancyscript{K}(M)\) is Koczkodaj’s distance based inconsistency index for \(M\), \(e(u)\) is the mean absolute estimation error for \(u\in C_{U}\), and \(\mu \) is any estimation provided by the HRE procedure.

Proof

Let us consider some element \(u\in C\) representing an initially unknown concept i.e. \(u\in C_{U}\). Let \(v_{1},\ldots ,v_{n}\) be its predecessors in \(G\) so that \((v_{i},u)\in E\wedge i=1,\ldots ,n\), for which the value \(\mu (v_{i})\) is known. Since the matrix \(M\) is complete there must exist edges \((v_{1},v_{2}),\ldots ,(v_{1},v_{n})\in E\) so that \(b_{2}=M(v_{1},v_{2}),b_{3}=M(v_{1},v_{3}),\ldots ,b_{n}=M(v_{1},v_{n})\). Moreover, let us denote edges between \(v_{i}\)’s and \(u\) as \(a=M(v_{1},u)\), and \(c_{2}=M(v_{2},u),\ldots ,c_{n}=M(v_{n},u)\) (see Fig. 1). Following the Eq. 1, Koczkodaj’s distance inconsistency index \(\fancyscript{K}(M)\), in short \(\fancyscript{K}\), means that the maximal local inconsistence for some maximal triad of three vertices \(t_{1},t_{2},t_{3}\in C\) is \(\fancyscript{K}\). Thus, in the case of triads composed of the concepts \(v_{1},v_{i},u\) it must hold that:

$$\begin{aligned} \fancyscript{K}\ge min\left\{ \left| 1-\frac{a}{b_{i}c_{i}}\right| ,\left| 1-\frac{b_{i}c_{i}}{a}\right| \right\} \end{aligned}$$
(14)

for all \(i=2,\ldots ,n\). This implies that one of the following two statements is true:

$$\begin{aligned} a\le b_{i}c_{i}\wedge \fancyscript{K}\ge 1 -\frac{a}{b_{i}c_{i}}\end{aligned}$$
(15)
$$\begin{aligned} b_{i}c_{i}\le a\wedge \fancyscript{K}\ge 1-\frac{b_{i}c_{i}}{a} \end{aligned}$$
(16)

Let us denote \(\alpha \overset{df}{=}1-\fancyscript{K}\) then the statements above can be written in the form:

$$\begin{aligned}&a\le b_{i}c_{i}\wedge \frac{1}{\alpha }\cdot a\ge b_{i}c_{i}\end{aligned}$$
(17)
$$\begin{aligned}&b_{i}c_{i}\le a\wedge b_{i}c_{i}\ge \alpha a \end{aligned}$$
(18)

Combining these two expressions (Eqs. 17 and 18) we obtain:

$$\begin{aligned} a\le b_{i}c_{i}\le \frac{1}{\alpha }\cdot a\, \vee \, \alpha a\le b_{i}c_{i}\le a \end{aligned}$$
(19)

Let us denote \(\beta _{1}\overset{df}{=}a\) and \(\beta _{i}\overset{df}{=}b_{i}\cdot c_{i}\), thus

$$\begin{aligned} a\le \beta _{i}\le \frac{1}{\alpha }\cdot a\, \vee \, \alpha a\le \beta _{i}\le a \end{aligned}$$
(20)

Since \(\alpha \le 1\) (see Eqs. 15 and 16) thus the statement (Eq. 20) implies:

$$\begin{aligned} \alpha a\le \beta _{i}\le \frac{1}{\alpha }\cdot a \end{aligned}$$
(21)

and of course

$$\begin{aligned} \alpha a\mu (v_{1})\le \beta _{i}\mu (v_{1})\le \frac{1}{\alpha }\cdot a\mu (v_{1}) \end{aligned}$$
(22)

During the first step of the algorithm the \(\mu \) is defined only for known concepts \(v\in C_{K}\). Thus, all the concepts \(v_{1},\ldots ,v_{n}\) are in \(C_{K}\) since only such elements are taken into account when calculating the value of \(\mu (u)\). It is assumed that the value of the ratio \(M(v_{i},v_{j})=m_{ji}\) for two a priori known concepts \(v_{i},v_{j}\in C_{K}\) , corresponds to the actual fraction , thus it holds that \(\mu (v_{1})\cdot b_{i}=\mu (v_{i})\). Therefore the update equation for \(\mu (u)\) (Listing: 1, line: 11) can be written in the form:

$$\begin{aligned} \mu (u)=\frac{1}{n}\left( \beta _{1}+\cdots + \beta _{2}\right) \mu (v_{1}) \end{aligned}$$
(23)

Since every component \(\beta _{i}\) is bounded (Eq. 21) their mean must also be within the same bounds, which leads to the conclusion that:

$$\begin{aligned} \alpha \cdot a\cdot \mu (v_{1})\le \mu (u)\le \frac{1}{\alpha }\cdot a\cdot \mu (v_{1}) \end{aligned}$$
(24)

The absolute estimation error for some \(u\) with respect to \(\mu \) at the end of the first step of the algorithm (Listing: 1) can be written and bounded from above as follows:

$$\begin{aligned} e_{1}(u)\!=\!\frac{1}{n}\overset{n}{\underset{i=0}{\sum }}\left| \mu (u) \!-\!\beta _{i}\mu (v_{1})\right| \le \underset{j=1,\ldots ,n}{max}\left| \mu (u) \!-\!\beta _{j}\cdot \mu (v_{1})\right| \!=\!\left| \mu (u)- \beta _{k}\cdot \mu (v_{1})\right| \nonumber \\ \end{aligned}$$
(25)

where \(k\in \{1,\ldots ,n\}\). Because both components of the absolute difference on the right side of the expression 25 have the same lower and upper bounds (Eqs. 22 and 24), then the maximal possible distance between them is limited by the difference between their upper and lower bounds. Thus,

$$\begin{aligned} \left| \mu (u)-\beta _{k}\cdot \mu (v_{1})\right| \le \frac{1}{\alpha }a\mu (v_{1})-\alpha a\mu (v_{1})=a\mu (v_{1})\left( \frac{1}{\alpha }-\alpha \right) \end{aligned}$$
(26)

then the absolute mean error for the purpose of traversing the first level of \(G\) is upper bounded by:

$$\begin{aligned} e_{1}(u)\le a\mu (v_{1})\left( \frac{1}{\alpha }-\alpha \right) \end{aligned}$$
(27)

The second level is a bit more complicated. Under the conditions of the algorithm there is \(u\in C_{U}\), and \(M\) is complete. Thus, there exists at least one known concept \(v\in C_{K}\) which precedes \(u\) in \(E\) i.e. \((v,u)\in E\). Let us put \(v_{1}=v\). This means that during the first step of the algorithm either \(v_{i}\) (for \(i=2,\ldots ,n\)) was in \(C_{K}\) thus obviously \(\mu (v_{1})\cdot b_{i}=\mu (v_{i})\) or \(\mu (v_{i})\) can be bounded using Eq. 24 (note that in order to use Eq. 24 \(a\) needs to be replaced by \(b_{i}\)). This leads to the following inequality:

$$\begin{aligned} \alpha \cdot b_{i}\cdot \mu (v_{1})\le \mu (v_{i})\le \frac{1}{\alpha }\cdot b_{i}\cdot \mu (v_{1}) \end{aligned}$$
(28)

thus,

$$\begin{aligned} \alpha \cdot \beta _{i}\cdot \mu (v_{1})\le c_{i}\cdot \mu (v_{i})\le \frac{1}{\alpha }\cdot \beta _{i}\cdot \mu (v_{1}) \end{aligned}$$
(29)

Since the Eq. 21 is valid for each triad, hence

$$\begin{aligned} \alpha ^{2}\cdot a\cdot \mu (v_{1})\le c_{i}\cdot \mu (v_{i})\le \frac{1}{\alpha ^{2}}\cdot a\cdot \mu (v_{1}) \end{aligned}$$
(30)

and then,

$$\begin{aligned} \alpha ^{2}\cdot a\cdot \mu (v_{1})\le \frac{\left( c_{1}\cdot \mu (v_{1})+\cdots +c_{n}\cdot \mu (v_{n})\right) }{n}\le \frac{1}{\alpha ^{2}}\cdot a\cdot \mu (v_{1}) \end{aligned}$$
(31)

which provides the estimation for \(\mu (u)\) for the purpose of the second step of the algorithm:

$$\begin{aligned} \alpha ^{2}\cdot a\cdot \mu (v_{1})\le \mu (u)\le \frac{1}{\alpha ^{2}}\cdot a\cdot \mu (v_{1}) \end{aligned}$$
(32)

Once again, the absolute error with respect to \(\mu \) at the end of the second step can be written and upper bounded as follows:

$$\begin{aligned} e_{2}(u)\!=\!\frac{1}{n}\overset{n}{\underset{i=0}{\sum }}\left| \mu (u)\!-\!\mu (v_{i})\cdot c_{i}\right| \le \underset{i=1,\ldots ,n}{max}\left| \mu (u)\!-\!\mu (v_{i})\cdot c_{i}\right| \!=\!\left| \mu (u)-\mu (v_{k})\cdot c_{k}\right| \nonumber \\ \end{aligned}$$
(33)

where \(k\in \{1,\ldots ,n\}\). Then, similarly as in the first step, both components have the same upper and lower bounds (Eqs. 30 and 32), thus the distance between them must not be greater than the distance between these limits. So,

$$\begin{aligned} e_{2}(u)\!\le \!\left| \mu (u)\!-\!\mu (v_{k})\cdot c_{k}\right| \!\le \!\frac{1}{\alpha ^{2}}a\mu (v_{1})-\alpha ^{2}a\mu (v_{1})\!=\!a\mu (v_{1})\left( \frac{1}{\alpha ^{2}}\!-\!\alpha ^{2}\right) \qquad \end{aligned}$$
(34)

which means that the absolute mean error in the second step of the algorithm is bounded from above as follows:

$$\begin{aligned} e_{2}(u)\le a\mu (v_{1})\cdot \left( \frac{1}{\alpha ^{2}}-\alpha ^{2}\right) \end{aligned}$$
(35)

Let us consider the \(r\)’th step of the algorithm. Similarly as in the second step we consider \(u\in C_{U}\wedge v_{1}\in C_{K}\) where \((v_{1},u)\in E\) (See Fig. 1). Let us assume by induction that every \(v_{i}\) for \(i=2,\ldots ,n\) is bounded as follows:

$$\begin{aligned} \alpha ^{r-1}\cdot b_{i}\cdot \mu (v_{1})\le \mu (v_{i})\le \frac{1}{\alpha ^{r-1}}\cdot b_{i}\cdot \mu (v_{1}) \end{aligned}$$
(36)

(compare with Eq. 28). Then by repeating the same reasoning as for the second step (Eqs. 2935) we come to the conclusion that:

$$\begin{aligned} e_{r}(u)\le a\mu (v_{1})\cdot \left( \frac{1}{\alpha ^{r}}-\alpha ^{r}\right) \end{aligned}$$
(37)

Due to the principle of induction the above inequality holds for eve \(r\in \mathbb N _{+}\).

Since every \(e_{r}(u)\) is a sum of absolute values then it cannot be negative i.e.

$$\begin{aligned} 0\le e_{r}(u) \end{aligned}$$
(38)

Moreover, the definition of \(\alpha \) implies

$$\begin{aligned} \fancyscript{K}\rightarrow 0\Rightarrow \alpha \rightarrow 1 \end{aligned}$$
(39)

Thus, due to the component on the right side of the inequality 37, when \(\alpha \rightarrow 1\) then the whole right side of Eq. 37 also approaches to \(0\). Therefore, it holds that for every step \(r>0\) it is true that:

$$\begin{aligned} \fancyscript{K}\rightarrow 0\Rightarrow e_{r}(u)\rightarrow 0 \end{aligned}$$
(40)

The above assertion, in the light of the arbitrary choice of \(u\in C_{U}\), satisfies the thesis of the theorem.\(\square \)

Fig. 1
figure 1

Triads of vertices with \(u\)

7 Convergence of solution

One of the immediate questions to come up is about the optimal number of iterations. The answer is not obvious, since the result of the algorithm depends on the input data. In particular, it is easy to construct simple graphs over the inconsistent PC matrix \(M\) in cases where every further step of the algorithm significantly increases the absolute error of estimation. For instance, the graph \(G\) such that \(G=(C_{K}\cup C_{U},E,M)\) where \(C_{K}=v_{1},C_{U}=v_{2},v_{3}\) and \(\{(v_{1},v_{2}),(v_{1},v_{3}),(v_{2},v_{3}),(v_{3},v_{2})\}\subseteq E\) where the values \(M(v_{1},v_{2}),\, M(v_{1},v_{3}),\,M(v_{2},v_{3}),\, M(v_{3},v_{2})\) or all are less than or greater than \(1\). In the first case the subsequent estimates \(\mu (v_{2})\) and \(\mu (v_{3})\) tend to \(0\), and reversely, if all the values are greater than \(1\), then \(\mu (v_{2})\) and \(\mu (v_{3})\) tends to \(\infty \). In the general case, the inequality 37 (Th. 1) suggests that with every subsequent step errors may increase. This leads to the conclusion that the optimal strategy is “the fewer steps the better”. Therefore, if the algorithm is not convergent, traversing at most one or two levels seems like a good idea.

If there are some edges with weights below \(1\) and some edges with weights above \(1\), the behavior of the algorithm is not so obvious. During the conducted experiments, it turned out that very often the subsequent iterations produce series of estimations convergent to some fixed positive results. To explain this phenomenon let us write the algorithm in the form of an appropriate system of equations, describing the on-step calculations (for the second step and the following ones). For this purpose, let us assume that the first iteration of the procedure was performed, hence the values \(\mu (v)\) are assigned to all the concepts \(v\in C\). For simplicity, let us assume that \(C_{U}=\{c_{1},\ldots ,c_{k}\},\, C_{K}=\{c_{k+1},\ldots ,c_{n}\}\) and denote \(b_{i}\) for all \(i=1,\ldots ,k\) as

$$\begin{aligned} b_{i}=\frac{1}{n-1}M(c_{k+1},c_{i})\mu (c_{k+1}) +\cdots +\frac{1}{n-1}M(c_{n},c_{i})\mu (c_{n}) \end{aligned}$$
(41)

Thus, during the second and subsequent iterations the algorithm calculates the new estimation value \(\mu (c_{i})\) for each unknown concepts \(c_{i}\in C_{U}\) according to one of the following equations:

$$\begin{aligned} \mu (c_{1}) \!&= \! \frac{1}{n-1}M(c_{2},c_{1})\mu (c_{2})+\cdots +\frac{1}{n-1}M(c_{k},c_{1})\mu (c_{k})+b_{1}\nonumber \\ \mu (c_{2}) \!&= \! \frac{1}{n-1}M(c_{1},c_{2})\mu (c_{2})\!+\!\frac{1}{n-1}M(c_{3},c_{2})\mu (c_{3})\!+\!\cdots \!+\!\frac{1}{n-1}M(c_{k},c_{2})\mu (c_{k})\!+\!b_{2}\nonumber \\&\cdots \\ \mu (c_{k})&= \frac{1}{n-1}M(c_{1},c_{k})\mu (c_{1})+\cdots +\frac{1}{n-1}M(c_{k-1},c_{k})\mu (c_{k-1})+b_{k}\nonumber \end{aligned}$$
(42)

Let us denote:

$$\begin{aligned} a_{ij}=\frac{1}{n-1}M(c_{j},c_{i})\wedge i\ne j\quad \text{ and }\quad a_{ii}=M(c_{i},c_{i})=1\quad \text{ and }\quad \mu (c_{i})=x_{i}\quad \end{aligned}$$
(43)

Then, the equation system takes the form:

$$\begin{aligned} \begin{array}{llllll} x_{1} &{} -(1-a_{11})x_{1} &{} -a_{12}x_{2} &{} -\ldots &{} -a_{1k}x_{k} &{} =b_{1}\\ x_{2} &{} -a_{21}x_{1} &{} -(1-a_{22})x_{2} &{} -\ldots &{} -a_{2k}x_{k} &{} =b_{2}\\ \ldots &{} \ldots &{} \ldots &{} \ldots &{} \ldots &{} \ldots \\ x_{k} &{} -a_{k1}x_{1} &{} -a_{k2}x_{2} &{} -\ldots &{} -(1-a_{kk})x_{k} &{} =b_{k} \end{array} \end{aligned}$$
(44)

Let us define the operator \(T:\mathbb R ^{k}\rightarrow \mathbb R ^{k}\) (see Bronstein et al. 2005) as follows:

$$\begin{aligned} Tx=\left( x_{1}-\overset{k}{\underset{r=1}{\sum }}a_{1r}x_{r} +b_{1},\ldots ,x_{k}-\overset{k}{\underset{r=1}{\sum }}a_{kr}x_{r} +b_{k}\right) ^{T} \end{aligned}$$
(45)

The considered equation system can be written in the form of a fixed point problem in the metric space \(\mathbb R ^{k}\):

$$\begin{aligned} x=Tx \end{aligned}$$
(46)

Let us assume one of the following metrics:

$$\begin{aligned} \rho (x,y)\!=\!\sqrt{\overset{k}{\underset{i=1}{\sum }}|\xi _{i} -\eta _{i}|^{2}},\quad \rho (x,y)\!=\!\underset{1\le i\le n}{max}|\xi _{i}\!-\!\eta _{i}|,\quad \rho (x,y)\!=\! \overset{k}{\underset{i=1}{\sum }}|\xi _{i}\!-\! \eta _{i}|\quad \end{aligned}$$
(47)

and \(x=(\xi _{1},\ldots ,\xi _{k})\), \(y=(\eta _{1},\ldots ,\eta _{k})\). It holds that if one of the following conditions \(Q_{1},Q_{2}\) or \(Q_{3}\) is less than \(1\),

$$\begin{aligned} Q_{1}=\sqrt{\overset{k}{\underset{i,j=1}{\sum }}|a_{ij}|^{2}},\quad Q_{2}=\underset{1\le i\le n}{max}\overset{k}{\underset{j=1}{\sum }}|a_{ij}|,\quad Q_{3}=\underset{1\le j\le n}{max}\overset{k}{\underset{i=1}{\sum }}|a_{ij}| \end{aligned}$$
(48)

then \(T\) turns out to be a contracting operator (Bronstein et al. 2005). According to the Banach fixed point theorem, there exists only one such point. Hence, in such a case there is only one set of values \(\mu (c_{1}),\ldots ,\mu (c_{k})\) which is the limit of the sequence of HRE estimations.

Due to the fraction the smaller \(a_{ij}\) is the larger \(C\) is and the smaller \(M(c_{j},c_{i})\) is. Moreover, the conditions (Eq. 48) can be more easily met with fewer concepts in \(C_{U}\) (that is because the summations embedded in the conditions (Eq. 48) include fewer elements). In other words, the estimation procedure has a high chance to be convergent if

  1. 1.

    The set \(C_{U}\) is relatively small (\(C_{K}\) is relatively large),

  2. 2.

    The estimated values \(\mu (v)\) are similar.

Both of these conditions are quite intuitive and, in practice, are likely to be satisfied. The first of them reflects the natural desire to provide the experts with rather more than the lower number of known, reference concepts. The second corresponds to the common-sense observation that all the considered concepts should be similar to each other, because then, it is easy to compare them. In other words, the expert estimates are more reliable when the compared projects are more similar.

The convergence of the algorithm implies the convergence of the estimation error. Unfortunately, the limit towards which the estimation error tends is not necessarily the smallest possible error value. Hence, in order to minimize the error, the user needs to choose the best estimation from all the estimations generated during the course of the algorithm (Listing: 1, line: 21).

In the case of the previously considered example (Sect. 5) all the three conditions \(Q_{1},Q_{2}\) and \(Q_{3}\) are below 1. Thus, the algorithm is convergent and the computed limits for \(\mu (v_{1}),\mu (v_{2})\) and \(\mu (v_{3})\) are 2.758, 4.578 and 3.493, respectively. The absolute average estimation errors converge to \(e_{\mu }(v_{1})=0.12,\, e_{\mu }(v_{2})=0.631\) and \(e_{\mu }(v_{3})=0.338\), and are minimal with respect to the mean of errors .

8 Discussion

The idea underlying the HRE algorithm is the assumption that experts work independently and try to do their job as best as they can. Thus, they may be wrong in a random manner, and if so, it makes sense to accept their estimates as a part of samples, and the expected value of the sample (the arithmetic mean) as the alleged value of the estimated \(\mu \) for the unknown concept. The absolute average estimation error may indicate how good such an estimation is. Depending on the data, the algorithm may or may not converge. If the algorithm does not converge (subsequent estimates are getting bigger) or converges to \(0\), selecting from among the first few estimates the one with the smallest error seems to be the best choice. If the algorithm converges to a non \(0\) value, it is very often useful to generate subsequent estimates until the current estimate does not differ from the limit (fixed point) with some \(\epsilon \), then to choose a set of estimations (among those generated) for which the average estimation error is minimal. Of course, this is not the only possible strategy. For instance, if some subset of \(C_{U}\) is particularly important, only errors for its elements may be taken into account while determining the optimal set of estimations.

However, the problem considered in the work relies on determining values of \(\mu \) for elements from \(C_{U}=\{c_{1},\ldots ,c_{k}\}\) on the basis of concepts \(C_{K}=\{c_{k+1},\ldots ,c_{n}\}\) using the matrix \(M\), it can be reformulated as computing a consistent approximation of a matrix \(M\) where certain elements are fixed. Indeed, at the end of the presented algorithm all the concepts \(c\in C\) have assigned some values \(\mu (c)\). Thus, defining \(m_{ij}^{\prime }=\frac{\mu (c_{i})}{\mu (c_{j})}\) allows the construction of a new matrix \(M^{\prime }=[m_{ij}^{\prime }]\), which is a consistent approximation of \(M\) where \(m_{ij}=m_{ij}^{\prime }\) for \(c_{i},c_{j}\in C_{K}\). In the case of the previously considered example, assuming the values of \(\mu (v_{1}),\mu (v_{2})\) and \(\mu (v_{3})\) corresponding to the smallest average absolute mean error \(e_{\mu }(C_{U})\), the matrix \(M^{\prime }\) equals:

$$\begin{aligned} \mathsf M^{\prime } =\left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} 1 &{} 0.602 &{} 0.789 &{} 0.46 &{} 1.379\\ 1.66 &{} 1 &{} 1.311 &{} 0.763 &{} 2.289\\ 1.266 &{} 0.763 &{} 1 &{} 0.582 &{} 1.746\\ 2.175 &{} 1.311 &{} 1.718 &{} 1 &{} 3\\ 0.725 &{} 0.437 &{} 0.572 &{} 0.333 &{} 1 \end{array}\right] \end{aligned}$$
(49)

In particular it holds that \(m_{45}=m_{45}^{\prime }=3\) and \(m_{54}=m_{54}^{\prime }=\frac{1}{3}\).

Although the presented considerations in (Sect. 6 and 7) assume that the matrix \(M\) is complete (see Def. 3), the HRE algorithm seems to work without this assumption. For incomplete matrices, the value \(\mu (u)\) for \(u\in C_{U}\) can be determined as long as a path exists in the pairwise comparisons graph \(G\) over \(M\) (Def. 4) between \(u\) and some element \(v\in C_{K}\). Hence, there is a chance that the HRE algorithm may support different pairwise comparisons techniques also when the input data are incomplete. In particular, it might be useful for the AHP (Saaty 1977) approach. The properties of the presented algorithm for the incomplete sets of input data will be the subject of future research.

9 Summary

The HRE algorithm presented here for computing estimations of initially unknown concepts \(C_{U}\) using information about known concepts \(C_{K}\) and the matrix \(M\) proposes a new approach to the pairwise comparisons method. It defines the intuitive algorithm of using the pairwise comparisons matrix \(M\) for determining the most probable values of unknown concepts (original stimulus) on the basis of the known concepts. The presented procedure iteratively generates sets of estimations, then chooses the set which has the smallest average absolute mean estimation error. According to the proven theorem, the size of the estimation error depends on \(\fancyscript{K}\)Koczkodaj’s distance based inconsistency index shows that the lower the inconsistency index, the lower the estimation errors. The theorem can be particularly useful when the number of iterations of the HRE algorithm is small. In such cases, it may in practice be used to estimate the size of the estimation errors.

For some input data sets, the subsequent estimations produced by the HRE algorithm converges. If this happens, the estimation errors also converge to some limit, thus the number of estimation sets produced by the HRE algorithm does not need to be limited to a few elements. In such a case the sets of estimations can be generated until the limit towards which the HRE results converge will not be close enough. Such a situation has also been addressed in the paper. The given conditions of convergence have an intuitive explanation and in many practical situations are likely to be met.

The HRE algorithm is suitable for any matrix with positive elements, i.e., even in those situations where the applicability of the classical eigenvector method can be limited (finding the largest absolute eigenvalue of a nonreciprocal matrix may be difficult). The presented algorithm remains open for much more data than can be stored in a single pairwise comparisons matrix. Due to the graph representation of the problem, multiple values defining ratios between the same pairs of concepts can be easily encompassed within the algorithm as multiple arcs between the same pairs of vertices.