A not sign-preserving iteration algorithm for the ‘Improved Normalized Squared Differences’ matrix adjustment model

Revesz, Tamas

doi:10.1007/s10100-022-00799-0

A not sign-preserving iteration algorithm for the ‘Improved Normalized Squared Differences’ matrix adjustment model

Open access
Published: 04 May 2022

Volume 31, pages 49–71, (2023)
Cite this article

Download PDF

You have full access to this open access article

Central European Journal of Operations Research Aims and scope Submit manuscript

A not sign-preserving iteration algorithm for the ‘Improved Normalized Squared Differences’ matrix adjustment model

Download PDF

Tamas Revesz ORCID: orcid.org/0000-0002-7547-0336¹

1691 Accesses
Explore all metrics

Abstract

Estimating the elements of a matrix, when only the margins (row and column sums) are known, but a supposedly similar ‘reference matrix’ is available, is a standard problem in many disciplines. After discussing the main types, issues and applications of these two-directional matrix adjustment problems the paper concentrates on the case of negative matrix elements and models with quadratic objective functions. The solution of the Improved Normalized Squared Differences (INSD) model is proved to be the same as the result of that iteration algorithm which is presented in the paper. It is also argued that if the sign-preservation requirement is dropped then the iteration procedure suggested by Huang et al. (Econ Syst Res 20(1):111–123, 2008) boils down to the same algorithm. Using the numerical example of the earlier literature it is also demonstrated that even in this not sign-preserving case, which even requires sign-flips for some elements, the INSD-model produces good fit in mathematical terms.

On the Weighted Total Least Squares Solutions

Alternative formulae for robust Weighted Total Least-Squares solutions for Errors-In-Variables models

Article 12 March 2021

Some matrix equations involving the weighted geometric mean

Article 20 October 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Estimating the elements of a matrix, when only the margins (row and column sums) are known, is a standard problem in many disciplines. The matrix adjustment problem most commonly discussed in the literature, which we will refer to as the two-directional matrix adjustment problem, can be formulated as follows:

Let X^* be an m x n unknown matrix, for which row and column sums are equal to the known u column vector and v column vector respectively (that is, X^*1 = u, 1^TX^* = v, where 1 is the summation (column) vector and the ^T superscript denotes transpose).

If certain cells of X^* are also known then by subtracting them from the(ir) corresponding row and column sums, the problem can be converted to the case of unknown cells. However, in most cases we have at most only indirect information about the value of the elements of the matrix. Generally, this indirect information is a reference matrix, for which we assume that it is some sense ‘similar’ to the target matrix. If we denote such a m x n reference matrix by A (also known as prior), then X^* can be estimated with the matrix (also of the size m x n) denoted by X, which in the given sense is the most similar to the reference matrix A and which has row sums equal to the column vector u and column sums equal to row vector v (i.e., X1 = u, 1^TX = v).

Of course, depending on the definition and given formula (measure) of the ‘similarity’ (or the ‘deviation’ or ‘distance’) of the two matrices (A and X), it is possible that the problem has several solutions (X) for which this formula gives the same value. However, if A is indecomposable, the set of possible solutions is compact, and the objective function can be continuously differentiated over that set, there is only one solution, which is true for the methods involved here (de Mesnard 2011).^{Footnote 1}

In any case, the matrix adjustment task can be defined as a mathematical programming problem, where the goal is to find the optimal value of the target function (the maximum of the similarity formula or the minimum of a monotonous increasing function of the deviation), subject to the constraints X1 = u and 1^TX = v (and possibly some nonnegativity or sign-preservation conditions^{Footnote 2}). Such equalities constrained optimization problems can be solved by the Lagrange multiplier method. By composing the Lagrangian function, differentiating it by each variables (i.e. the unknown elements of the matrix and the Lagrange multipliers) and setting the resulting formulas to zero, we get the so called normal equations. In the case of the most well-known objective functions these normal equations can be formulated as a functional relationship between the estimated matrix (as dependent variable), the reference matrix and the Lagrange multipliers (independent variables). However, the Lagrangian multipliers are also variables to be determined only by the normal equations. Therefore, the solution of the normal equations would require a different formula. In a few cases an explicit formula (for the elements of the estimated matrix) exists. In many other cases the solution can be obtained by an iterative procedure.

The paper is structured as follows: Sect. 2 reviews the main and most widely used types of the constrained matrix adjustment methods. Section 3 gives a short overview of how matrices containing some negative elements were traditionally and in a simple way treated in the matrix adjustment problems. In addition, it describes how the entropy-models, the models with quadratic objective function and in particular the INSD-model deal with the negative elements and with the issues of the sign-preservation. This section also presents the linear algebraic solution of the INSD-model, and its iterative solution algorithm, in particular in the not sign-preserving case. Section 4 shows that when the sign-preservation requirement is omitted from the INSD-model, the iterationn procedure Huang et al. (2008) suggested for solving the INSD model boils down to a simple iteration procedure which the author of this paper have been using for decades and which—following the classification of de Mesnard (1990),—may be called ‘additive-correction’-algorithm or perhaps more precisely additive absolute value-shares proportional correction algorithm. By numerical examples Sects. 4.3 and 5 demonstrate the merit of this algorithm and discusses the suggested scope of its applicability. In the Conclusion our findings are put in a broader context by emphasising the merit of having such transparent formulas describing the relationship of the estimated matrix, the reference matrix and the known margins. The importance of trying to obtain as good reference matrices as possible and combining these methods with other matrix adjustment and estimation methods are also emphasized.

2 The main types of objective functions of matrix adjustment

In the academic literature the matrix adjustment problem appeared about one hundred years ago, first as a matrix filling problem of cross-tabulations (or “contingency tables” as Pearson 1904 named it) of demographic data. In particular, the task was to update such tables (e.g. census data) to given (known, already updated) marginal totals or adjusting sample estimates of contingency matrices to known population values for their marginal totals (Deming and Stephan 1940).

Many researchers from several disciplines developed solution methods to this kind of matrix adjustment or matrix balancing problems, in some cases independently, being unaware of others’ achievements (see a review of these in Lahr and de Mesnard 2004). That is why mathematically equivalent methods are called diversely in distinct disciplines.

The deviation of the estimated matrix from such reference matrices can be measured by various functions. In the academic literature two main type of such objective functions were developed: information (entropy) measure-based functions (which contain the logarithm function borrowed from the information theory) and error minimization functions, where the ‘error’ means the weighted sum of the positive (absolute value or squared) differences (in the latter case these are quadratic objective functions based on the principle of least squares) between the estimated values and their counterparts in the reference matrix.

In the literature a vast discussion emerged about which method and under what circumstances is more efficient and more reliable. According to experience so far, the best choice depends on the mathematical properties of the reference matrix, target margins, and expectations about the target matrix (non-negativity, zero values, sign switching, sparse matrix etc.), and the economic content of the matrix.

2.1 The entropy models

One of the earliest and still most popular entropy-model is the so-called RAS-method. Its objective function is the following (see in Bacharach 1970):

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {x_{i,j} \ln \left( {x_{i,j} /a_{i,j} } \right) \to \min } } $$

(1)

By using the method of Lagrangian multipliers Bacharach (1970) proved that the solution of (1) subject to

$$ {\mathbf{X1}} = {\mathbf{u}}, $$

(2)

$$ {\mathbf{1}}^{{\mathbf{T}}} {\mathbf{X}} = {\mathbf{v}} $$

(3)

can be expressed as the

$$ {\mathbf{X}} = \hat{\mathbf{{r}}}\mathbf{A}{\hat{\mathbf{s}}},\,({\text{or}}\,{\text{in}}\,{\text{algebraic}}\,{\text{notations}}:x_{i,j} = a_{i,j} \cdot r_{i} \cdot s_{j} \,{\text{for}}\,{\text{all}}\,i\,{\text{and}}\,j\,{\text{indices}}) $$

(4)

‘biproportional’ formula (from which the model is called the ‘RAS-model’), where ˆ denotes the diagonal matrix of a vector, and r and s are the vectors generated from the Lagrangian multipliers of the constraints (2) and (3), respectively. He also showed that if matrix A is indecomposable, then the solution ${\mathbf{X}} = \hat{\mathbf{r}}\mathbf{A}\hat{\mathbf{s}}$ is unique, except for any arbitrary δ and 1/δ scalar multipliers for r and s vectors respectively. This means that if a r and s vector pair is a solution, then the r·δ and s/δ vector pair is also.

Bacharach (1970) proved that the above solution of the RAS-model can be achieved by the following iterative proportional fitting procedure (called the RAS-algorithm):

Let be the x_i,j^(0)(r) = a_i,j, ${\mathbf{w}} = {\mathbf{1}}^{{\mathbf{T}}} {\mathbf{A}},\,{\mathbf{q}} = {\mathbf{A1}},\,{\mathbf{R}} = {\hat{\mathbf{q}}}^{{ - {1}}} {\mathbf{A}},\,{\mathbf{C}} = \mathbf{A}\hat{\mathbf{w}}^{{ - {1}}}$ and compute for n = 1, 2, … the following 2 formulas:

$$ x_{i,j}^{{\left( {\text{n}} \right)\left( {\text{r}} \right)}} = x_{i,j}^{{({\text{n}} - {1})}} \cdot g_{i}^{{\left( {\text{n}} \right)}} \cdot r_{i,j} $$

(5)

(where g_i⁽ⁿ⁾ = u_i/Ʃ_j x_i,j⁽ⁿ⁻¹⁾) and

$$ x_{i,j}^{{\left( {\text{n}} \right)}} = x_{i,j}^{{\left( {\text{n}} \right)({\text{r}})}} \cdot h_{j}^{{\left( {\text{n}} \right)}} \cdot c_{i,j} $$

(6)

formulas, where h_j⁽ⁿ⁾ = v_j/Ʃ_i x_i,j^(n)(r).

Normally this iteration process converges so that $\underset{n\to \infty }{\mathrm{lim}}{x}_{i,j}^{(n)}$ = a_i,j · r_i · s_j (on the conditions for convergence see McGill 1977; Möhr et al. 1987).

The RAS-algorithm works even when some elements of A are zeros, and then the corresponding elements of X remain also zeros. In this case the solution can be computed by minimizing the $\sum\nolimits_{i = 1}^{m} {\sum\nolimits_{j = 1}^{n} {x_{i,j} } }$ ln(x_i,j/a_i,j) sum subject to (2) and (3), where the summation should be restricted only to those indices where a_i,j ≠ 0.

From the (5) and (6) formulas of the RAS algorithm it is also obvious that if A, u and v are non-negative then the estimated X will be also non-negative (a formal proof can be found in Bacharach 1965). In general, if a_i,j ≠ 0 then x_i,j should have the same sign as a_i,j (i.e. $\frac{{x}_{i,j}}{{a}_{i,j}}$ > 0 should hold), otherwise the ln(x_i,j/a_i,j) term can not be computed. Note, that even if some elements of A, u and v are negative, technically the RAS-algorithm still may work, but sign changes could occur, though their occurrence is extremely unlikely if u and v are non-negative (Günlük-Șenesen and Bates 1988).

2.2 Models with power functions

From the first mathematical discussion of the conditional matrix adjustment problem in the academic literature (Deming and Stephan 1940) various models with quadratic objective functions were proposed and analysed. In the most general case, the

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {(x_{i,j} {-}a_{i,j} )^{p} /v_{i,j} } } $$

(7)

type of power-function can be used, where (in case of stochastic interpretation of the problem) v_i,j may represent the variance of the i,j- th element of the X or A matrix depending on which is regarded to be stochastic (see Stone (1977) and Byron (1978), Günlük-Șenesen and Bates 1988). If v_i,j = a_i,j^q then (7) can be rewritten as

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {(x_{i,j} {-}a_{i,j} )^{p} /a_{i,j}^{q} } } $$

(8)

However, in practice this formula is usually applied with the p = 2 and q = 1 or 0 settings. The p = 1 case and the q = 0 case have many drawbacks (e.g. the p = 1 case requires the computation of the absolute value of the numerator but the resulting | x_i,j − a_i,j | function is not differentiable, while the q = 0 case, even if applied to input–output coefficients thereby eliminating the bias resulting from the different size of the individual industries, still does not address the problem of the tendency to produce large proportional changes to small coefficients—see Lecomber 1975), therefore we deal only with the most common case when p = 2 and q = 1.

In this case the objective function can be written as follows:

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {(x_{i,j} {-}a_{i,j} )^{2} /a_{i,j} \to {\text{min}}} } $$

(9)

If in the case of contingency tables x_i,,j and a_i,j represent two-dimensional probability distributions then the half of the above expression is the Pearson’s χ² -statistic (see a detailed discussion of this stochastic interpretation in Smith 1947).

By introducing the z_i,j = x_i,j /a_i,j ratios (9) can be rewritten as the weighted sum of the squared sum of the percentage deviation of the x_i,j elements from the corresponding a_i,j elements:

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {a_{i,j} \cdot (z_{i,j} {-}{1})^{2} \to {\text{min}}} } $$

(9a)

This objective function is attractive partly because the expression $\sum\nolimits_{i = 1}^{m} {\sum\nolimits_{j = 1}^{n} {(x_{i,j} {-}a_{i,j} )^{2} } }$ in (9) can be interpreted as the square of the Euclidean distance of the vectors ${\hat{\mathbf{a}}}$ and ${\hat{\mathbf{x}}}$, where ${\hat{\mathbf{a}}}$ = (a_1,1, a_1,2,…, a_1,n, a_2,1, a_2,2,…, a_2,n,…, a_m,1, a_m,2,…, a_m,n) and ${\hat{\mathbf{x}}}$ = (x_1,1, x_1,2,…, x_1,n, x_2,1, x_2,2,…, x_2,n,…, x_m,1, x_m,2,…, x_m,n), i.e. they are formed from A and X so that the rows of their respective matrices are rearranged end to end. More importantly from (9a) one can see that the weighting by a_i,j represents a reasonable compromise (i.e. opting for a balanced risk of distorting the large or small entries) between not weighting at all the (z_i,j − 1)² squared percentage differences and the (x_i,j − a_i,j)² squared absolute differences.

Indeed, this objective function was already used by Deming and Stephan (1940) (p. 429 Eq. (3)) and Friedlander (1961) (p. 414) who duly applied the method of the Lagrangian-multipliers for the conditional optimizing problem consisting of (2), (3) and (9). Without the loss of generality, they halved the (9) objective function and hence derived the following simpler relationship between the optimal value of x_i,j and the Lagrangian multipliers:

$$ x_{i,j} = a_{i,j} \cdot \left( {{1} + \lambda_{{\text{i}}} + \tau_{{\text{j}}} } \right) $$

(10)

where λ_i is the multiplier of the discrepancy of the i-th row and τ_j is the multiplier of the discrepancy of the j-th column (i = 1, 2, …, m; j = 1, 2, …,n).

3 Treatment of negative elements in the matrix adjustment problems

3.1 Simple treatment of negative elements in matrix adjustment methods

By reviewing the academic literature on the margins constrained matrix adjustment methods we can enumerate the following methods regarding the simple treatment of negative elements in the reference matrix:

1.
leaving them unchanged (that is if a_i,j < 0 then setting x_i,j to a_i,j, i.e. x_i,j = a_i,j)
2.
setting them to zero (either only in the reference matrix or directly so that x_i,j = 0)
3.
setting the corresponding elements of the X matrix exogenously
4.
aggregating the data so that the negative elements be added to more positive elements so that the aggregate figure be non-negative
5.
reformulating the problem or introducing ‘mirror accounts’ (Lenzen et al. 2014) so that negative items disappear (e.g. in commodity balances rearranging the final demand block’s negative stock change to positive stock change on the side of the sources as ‘decreases in inventory’)
6.
imposing an explicit x_i,j > 0 nonnegativity constraint and solving the model as a linear programming model by any related solvers
7.
In the case of estimating a (quadratic) so-called Social Accounting Matrix (SAM), where the row and column margins are the same (u = v) if the i,j-th element of the reference matrix is negative (the revenue of the i-th account from the j-th account), then it is treated instead as a negative revenue of the j-th account from the i-th account

3.2 Treatment of negative elements in entropy-models

Although the sign-preserving nature of the entropy-models made them popular among researchers dealing with various problems (in particular among economists who tried to update and project input-coefficient matrices) for a long time it prevented the application of the entropy models for matrices with negative element(s).

The basic problem of the entropy models is (as pointed out by Günlük- Șenesen and Bates (1988) that if x_i,j < 0 in (1) is negative then these ‘weights’ are ‘counterproductive’ in the sense that instead of penalizing the corresponding percentage deviation of x_i,j from a_i,j, they reward such deviations by subtracting the value of the corresponding ln(x_i,j/a_i,j) terms from the penalty (objective) function.

Furthermore, if x_i,j < a_i,j then the ln(x_i,j/a_i,j) term can be negative which makes the whole weighting inappropriate and the objective function meaningless.^{Footnote 3}

To solve these problems Günlük-Șenesen and Bates proposed the following ‘information change’ measure as the objective function:

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {|x_{i,j} {\text{ln(}}x_{i,j} /a_{i,j} {)}| \to {\text{min}}} } $$

(11)

Although this allows for negative x_i,j and/or a_i,j elements but still does not allow for sign flips. Moreover, these authors did not elaborate what algorithm can find its solution. Instead, they present a modified Lagrangian, which may be written correctly^{Footnote 4} as

$$ F = \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {\left\{ {|x_{i,j} |\left[ {{\text{ln}}\left( {x_{i,j} /a_{i,j} } \right){-}{1}} \right] + {\text{ln}}\left( {r_{i} } \right) \cdot \left( {u_{i } {-}\sum\limits_{j = 1}^{n} {x_{i,j} } } \right) + {\text{ ln}}\left( {s_{j} } \right) \cdot \left( {v_{j } {-}\sum\limits_{i = 1}^{m} {x_{i,j} } } \right)} \right\}} } $$

(12)

Note, that although they do not explain and it is not clear from the formula, but taking the logarithm of the Lagrangian multipliers is probably meant to harmonize them with the ln(x_i,j/a_i,j) term. The solution of the above model is

$$ x_{i,j} = r_{i} \cdot a_{i,j} \cdot s_{j} \,{\text{if}}\,a_{i,j} > 0\,{\text{and}}\,x_{i,j} = \left( {{1}/r_{i} } \right) \cdot a_{i,j} \cdot \left( {{1}/s_{j} } \right)\,{\text{if}}\,a_{i,j} < 0 $$

(13)

The above treatment of the negative elements expresses the principle that all elements have to be adjusted in the direction to eliminate the given (row- or column-) discrepancy. For example, when r_i > 1 (i.e. when the actual row total is less than the prescribed one) then the absolute values of the negative elements have to be decreased by dividing them by r_i. This principle we will employ in other models of this paper.

Strangely, in the next 15 years the findings of Günlük-Șenesen and Bates (1988) ‘was little acknowledged in subsequent literature on updating input–output tables’ so that Junius and Oosterhaven (2003) had to ‘rediscover’ it (Temurshoev et al. 2011). The objective function of their ‘generalized RAS’ (GRAS) method was

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {|a_{i,j} | \cdot z_{i,j} \cdot {\text{ln}}\left( {z_{i,j} } \right),} } $$

(14)

where z_i,j = x_i,j/a_i,j, so that z_i,j > 0. However, this objective function cannot be applied if not all columns and rows have a positive element (Temurshoev et al. 2013), and even in normal cases it does not always give the best results among the estimation methods available (Lemelin 2009). Later on, Oosterhaven (2005) himself also pointed out that negative and positive differences can cancel each out in the originally proposed objective function (creating the illusion of perfect fit) and instead of it, he proposed the

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {|a_{i,j} \cdot z_{i,j} \cdot {\text{ln(}}z_{i,j} {)}|} } $$

(14a)

absolute information loss (AIL) function. Later again, Lenzen et al. (2007) corrected the GRAS objective function by replacing the ln(z_i,j) term by ln(z_i,j/e), where e is the Euler-number (the base of the natural logarithm).^{Footnote 5} Huang et al. (2008) further changed the objective function to

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {|a_{i,j} | \cdot (z_{i, \, j} \cdot {\text{ln(}}z_{i, \, j} /e{)} + {1})} } $$

(15)

and named this ‘improved GRAS’ (IGRAS). This they got by adding a constant to the function, with which in the case of z_{i, j} = 1 it will be zero, but this does not affect the location of the optimum.

In any case, it can be seen from the logarithm in the objective function that if the model has any solution, for every i, j pairs x_i,j /a_i,j = z_i,j ≥ 0 holds. Thus, the solution of the GRAS model guarantees that matrix elements preserve their sign.

Huang et al. (2008) duly write the associated Lagrangian function of (15) and derive the following optimality conditions:

$$ z_{i,j} = \left\{ {\begin{array}{*{20}l} {e^{{\lambda_{i} }} e^{{\tau_{j} }} } \hfill & {if\quad a_{i,j} > 0} \hfill \\ 1 \hfill & {if\quad a_{i,j} = 0} \hfill \\ {e^{{ - \lambda_{i} }} e^{{ - \tau_{j} }} } \hfill & {if\quad a_{i,j} < 0 } \hfill \\ \end{array} } \right. $$

(16)

where λ_i and τ_j are the Lagrangian multipliers belonging to the row- and column sum deviations. Since the exponential function is always positive, from this it also can be seen that the solution of the IGRAS-model must be sign preserving (if exists at all). If we use the r_i = ${e}^{{\lambda }_{i}}$ and s_j = ${e}^{{\tau }_{j}}$ notations, the solution can be rewritten as z_i,j = r_i ‧ s_j if a_i,j > 0 and z_i,j = (1/r_i)‧(1/s_j) if a_i,j < 0, much similar to what Günlük-Șenesen and Bates (1988) produced (see Eq. (9)), but of course with different values for r_i and s_j. From this form of the solution it also can be seen easily that the resulting adjustment treats the positive and negative elements differently, so that both should change in the direction to eliminate the row- and column-discrepancies.

However, r_i and s_j can be computed only by a complicated iteration process (Junius and Oosterhaven 2003; Temurshoev et al. 2013).

A problem of the entropy models is that the deviation of x_i,j from a_i,j is not treated symmetrically (as Huang et al. (2008) put it: the objective function is ‘biased’) and hence the entropy measure can not be interpreted as their ‘distance’.

Although models with such logarithmic terms seem to be attractive due to their apparent relationship with information theory, the division of x_i,j by a_i,j can hardly be interpreted as some entropy measure especially since a_i,j can not be regarded to be a realisation of any probability variables. Indeed, there are some attractive (in some cases only partial) interpretations for the chosen objective functions (see for example Bacharach (1970) pp 83–84) but they are debatable and there is not a clear rule which one to accept (see for example McDougall’s (1999) critique about the misinterpretation of the concept of ‘cross-entropy’ by some modelers).

Lemelin (2009) compares the GRAS-model with the cross-entropy measure of information loss of Kullback and Leibler (1951), to be referred to as K-L measure. To make it meaningful for negative matrix elements too, he redefines the q_i,j ‘a priori probabilities’ and p_i,j ‘a posteriori probabilities’ in the $\sum\nolimits_{i = 1}^{m} {\sum\nolimits_{j = 1}^{n} {p_{i,j} {\text{ln}}\left( {p_{i,j} /q_{i,j} } \right)} }$ minimand so that they be the absolute value shares of the given elements in their grand total: q_i,j = |a_i,j|/Ʃ_i Ʃ_j |a_i,j| and p_i,j = |x_i,j|/Ʃ_i Ʃ_j |x_i,j|. Based on this he observes the following:

Property

due to Lemelin (2009):

The GRAS objective function of Junius and Oosterhaven (2003) is not exact representation of the K–L measure.

He demonstrates this by simple algebraic calculus, taking into account that in the case of negative elements the Ʃ_i Ʃ_j |x_i,j| grand total is not fixed. He illustrates the difference by several numerical examples starting from the example of Junius and Oosterhaven (2003). Then it is first modified (by changing the prescribed marginal totals and the sign of some elements) so that it can be interpreted as a net world trade matrix, then by modifying further the prescribed marginal totals appropriately, it is reinterpreted as the matrix of net investment positions, where a_i,j represents the net asset of the j-th country from i-th category. In this latter case, the row sums of X must be zero and some of the column sums must be negative. Table 1 shows the initial matrix and the required row and column sums.^{Footnote 6}

Table 1 Initial matrix of net international investment positions and prescribed margins

Full size table

First, by applying the GRAS-method to these data, he found that the figures of the resulting matrix ‘are out of proportion with the initial values’ (i.e. which are too large in absolute value). Then by applying the above K-L measure to the same data he obtained the following results (see Table 2 here, and Table 8 in Lemelin 2009).

Table 2 Matrix of net international investment positions adjusted by Lemelin’s model

Full size table

These results will be compared with the results of the INSD model to be discussed subsequently in the present paper.

3.3 Treatment of negative elements in models with quadratic objective function

Clearly, quadratic objective functions like (9) can deal with negative elements in the reference matrix and do not prevent sign flips. However, in many applications, i.e. when the reference matrix and/or the matrix to be estimated is non-negative, the modelers regarded the possibility of sign-flips to be a drawback.

Henry (1973) and (1974)—while (by formally presenting the related Lagrangian function) reproduces the (7) relationship between the initial and estimated matrix—argues that it is rather unlikely that the 1 + λ _i + τ_j sum is negative, therefore the (9) objective function also tends to preserve the sign of the matrix elements.

What matters more for us about Henry’s second paper that he tells that he had received some ideas and proposals from Lecomber (see Lecomber 1971) regarding the merit of using the absolute value of a_i,j in the denominator of the (9) model, which in this case could be written as

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {(x_{i,j} {-}a_{i,j} )^{2} /|a_{i,j} | \to {\text{min}}} } $$

(17)

However, apparently Henry did not realise the importance of this proposal. First he just demonstrates that in the case of some negative a_i,j elements the original (9) minimand does not produce the optimal solution (since if a_i,j < 0 then the corresponding 1/a_i,j > 0 s-order condition is not met) and then just gives a numerical example by which he tries to convince the reader that even in the case of negative a_i,j elements the original (9) minimand results in, although not optimal, but saddle-point like solution in which the (only) ‘negative entry has received fair and equitable treatment’. In his example it meant that when all margins were doubled then the negative element has also been doubled in the estimated matrix. But just this is what has to be avoided in many cases!

Indeed, in the case of any ‘net’ economic variable that is a residual, such as household savings (income minus consumption expenditures), inventory changes, etc. one has to try to estimate those two categories, whose difference is the given net variable. Such ‘net’ economic variables tend to be extremely volatile, because the underlying gross flows may change in an uncoordinated manner. So net flows are almost impossible to model; it is largely preferable to model the underlying gross flows. If this is not possible (e.g. due to data unavailability, as in the case of the ‘changes in inventories’) and the behavior of these categories are rather different, then the matrix adjustment techniques are seldom applicable. However, there are many cases when the ‘net’ variable behaves reasonably, i.e. in the same direction as the corresponding row- and column-totals.

For example, in the case of devaluation of the domestic currency (which decrease the domestic demand and increases the incentive/pressure to sell abroad) one may expect that—without having detailed information on the changes in the patterns of exports and imports—the trade balances (export minus import) of each commodity improve, either so that the hitherto positive balance increases further, or the originally negative balance shrinks or even turns positive. In other words, the balances do not change proportionately as Henry would regard it to be ‘fair’ and ‘equitable’.

In any case, Lecomber did not seem to be satisfied with Henry’s answer and in a book chapter written by him (see Lecomber 1975) he remarks that in the case of negative elements ‘The Friedlander minimand given by Henry (his Eq. 3) also breaks down, but a simple modification—replacing a_i,j in the denominator by | a_i,j |—gives satisfactory answers’.^{Footnote 7} But to support this claim he just gives the following vector-analogue example: ‘consider adjusting the vector (− 1, 2) to sum of 4. Pro-rata adjustment (analogous to RAS) gives (− 4, 8), while the modified Friedlander adjustment gives (1, 3).’

The problem of this remark is that Lecomber’s example is wrong. The resulting vector is not the (1, 3) (for which the minimand is still 4.5 large) but the (0, 4) (for which the minimand is minimal, i.e. 3). Also note, that to get the ‘modified Friedlander-adjustment’ solution of such vector- (or matrix) analogous problems, one can use not only the related linear programming model solution algorithms but also the much simpler but so far unnoticed solution algorithm which we are going to discuss in Sect. 4.2.

Unfortunately, Lecomber made his remark only in a footnote and apparently did not impress neither Henry nor others too much.

Sign preservation also can be built into models containing quadratic objective functions. Jackson and Murray (2004), for example, use the following equivalent form of the (17) minimizing criterion (see their Model 10)

$$ \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{n} {(z_{i, \, j} \cdot a_{i,j} {-}a_{i,j} )^{2} /|a_{i,j} | \to {\text{min}}} } $$

(17a)

where z_i,j = x_i,j /a_i,j, subject to the usual marginal conditions and the z_i,j ≥ 0 nonnegativity constraints. Due to the inequality conditions of this so-called ‘sign-preserving squared differences’ model it can not be solved using the simpler set of scaling techniques but can be solved only by commercial mathematical programming software packages (Lahr and de Mesnard 2004; Huang et al. 2008). However, in such solutions it is not transparent how the results are related to the parameters of the model. This makes the further analysis of the results and the further development of the model harder.

3.4 The Improved Normalized Squared Differences model

Maybe that is why Huang et al. (2008)—who call (17) the ‘Improved Normalized Squared Differences’ (INSD) model—prevent the switch of the sign of the matrix elements in an alternative way. They use an additional + M/2 $\sum\nolimits_{i = 1}^{m} {\sum\nolimits_{j = 1}^{n} {|a_{i,j} | \cdot \, [{\text{min}}(0,z_{i,j} )]^{{2}} } }$ component in the Lagrangian function, where M is an arbitrarily chosen sufficiently large positive number which multiplied by the [min(0, z_i,j)]² factor (which is positive if z_i,j < 0) produces a sufficiently large penalty value to prevent this to happen. We may refer to this model as the Sign-preserving Improved Normalized Squared Differences (SINSD) model. Then they derive the following optimality condition:

$$ z_{i,j} = \left\{ {\begin{array}{*{20}l} 1 \hfill & \quad{if\quad a_{i,j} = 0} \hfill \\ {1 + sgn\left( {a_{i,j} } \right)\left( {\lambda_{i} + \tau_{j} } \right)} \hfill & \quad{if\quad this\,is\,nonnegative\,or\,M = 0} \hfill \\ {0 } \hfill & \quad{if\quad 1 + sgn\left( {a_{i,j} } \right)\left( {\lambda_{i} + \tau_{j} } \right) < 0\,and\,M \to \infty } \hfill \\ \end{array} } \right. $$

(18)

$$ \lambda_{{\text{i}}} = \left\{ {\left( {{\text{u}}_{{\text{i}}} {-}\Sigma_{{\text{j}}} a_{i,j} } \right) + \Sigma_{{\text{j}}} (M \cdot a_{i,j} \cdot {\text{min}}(0,z_{i,j} ){-}\tau_{{\text{j}}} \cdot |a_{i,j} |)} \right\}/\Sigma_{{\text{j}}} |a_{i,j} |,\quad {\text{and}} $$

(18a)

$$ \tau_{{\text{j}}} = \left\{ {\left( {{\text{v}}_{{\text{j}}} {-}\Sigma_{{\text{i}}} a_{i,j} } \right) + \Sigma_{{\text{i}}} (M \cdot a_{i,j} \cdot {\text{min}}(0,z_{i,j} ){-}\lambda_{{\text{i}}} \cdot |a_{i,j} |)} \right\}/\Sigma_{{\text{i}}} |a_{i,j} |, $$

(18b)

where λ_i and τ_j are the Lagrangian multipliers of the row- and column sum deviations.

To solve this system of simultaneous nonlinear equations they suggest the following iteration algorithm: Let us initialize z_i,j = 1, λ_i = 0, τ_j = 0, and calculate them with Eqs. (18), (18a) and (18b) iteratively. They claim that finally one gets the solution for z_i,j. Although they applied this for a numerical example, the reader could not find out how this iteration process could produce the published results. Only Temurshoev et al. (2011) (who also corrected the sign of the row and total discrepancies in their Lagrangian) corrected the above sequence of the iteration steps by clarifying that first the λ_i -s have to be computed from (18a), then these have to be substituted into (18b) to compute the τ_j -s and only finally have to be computed the z_i,j − s by (18).

Note, that Huang et al. (2008) derive the following alternative optimality condition (see their Eq. (25) which—naturally together with (2) and (3)—can be used instead of (18), (18a) and (18b):

$$ x_{i,j} {-}a_{i,j} + M \cdot a_{i,j} \cdot {\text{min}}(0,z_{i,j} ) = |a_{i,j} | \cdot \left( {\lambda_{{\text{i}}} + \tau_{{\text{j}}} } \right) $$

(19)

As it could be expected from the titles of their articles, Huang et al. (2008) and Temurshoev et al. (2011) were interested in models which are sign-preserving. However, they observe that since the INSD objective function is the first-order Taylor-series approximation of the sign-preserving IGRAS objective function, it also likely produces sign-preserving results.

After discussing reasons for sign-changes in inventories (which products and how often change sign in the related column of the input–output tables), Lenzen et al. (2014) reverse the general negative judgement of the sign-flips and in certain cases regard it to be even desirable.

If in this spirit we remove the (sign-preserving) M parameter from (19) we get the

$$ x_{i,j} = a_{i,j} + |a_{i,j} | \cdot \left( {\lambda_{{\text{i}}} + \tau_{{\text{j}}} } \right) $$

(20)

simpler optimality condition for the INSD model, which is equivalent to the z_i,j = 1 + sgn(a_i,j)· (λ_i + τ_j) part (case M = 0) of (18). It appears from this latter form, that for computing the optimal z_i,j, the Lagrange multipliers should be added in the case of positive, and should be subtracted in the case of negative a_i,j elements. Note, that it is apparent from formula (20) that it leads to the same result for x_i,j when any φ scalar is added to λ_i and on the other hand subtracted from τ_j. replacing λ_i + φ and replacing τ_j with τ_j − φ, they still satisfy (20). So this means that while in the case of the ‘multiplicative’ RAS, the degree of freedom of the Lagrangian multipliers is manifested in a proportionality factor, while for the INSD in an additive component. In view of (20) we may call the INSD-model ‘A + R|A| +|A|S model’, where |A| denotes the matrix containing the absolute values of the elements of A.

Introducing d_i,j = x_i,j − a_i,j, subtracting a_i,j from each side of (20) we obtain

$$ d_{i,j} = |a_{i,j} | \cdot \left( {\lambda_{{\text{i}}} + \tau_{{\text{j}}} } \right) $$

(21)

expressing the relation between the Lagrangian multipliers and the (optimal) changes of the elements of the matrix. Note, that although the INSD model is not strictly biproportional, both the row-wise and column-wise adjustments of (changes in) the given elements are proportional to the absolute value of the given entry of the reference matrix.

Equation (20) together with (2) and (3) form a system of linear equations with n·m + n + m equations and the same number of variables. However, one equation and variable can be dropped since the conditions of (2) and (3) are linearly interdependent provided they are consistent (as already pointed out by Deming and Stephan 1940 and discussed further by Friedlander 1961) and since one of the λ_i and τ_j variables can be chosen arbitrarily (as demonstrated above by showing that if λ_i and τ_j represent a solution, then λ_i + φ and τ_j − φ too, whatever φ scalar is used). Therefore, the ‘truncated’ system of independent linear equations can be solved by any of the known methods.

However, the x_i,j, λ_i and τ_j variables need not be determined simultaneously by such a big system of linear equations. The λ_i and τ_j variables can be solved first, and then the x_i,j variables can be computed easily afterwards (e.g. using Eq. (20)). This is demonstrated by the following considerations:

In the not sign-preserving case (i.e. if M = 0) Eqs. (18a) and (18b) boil down respectively to the

$$ \lambda_{{\text{i}}} = \left\{ {\left( {u_{{\text{i}}} {-}\Sigma_{{\text{j}}} a_{i,j} } \right){-}\Sigma_{{\text{j}}} (\tau_{{\text{j}}} \cdot |a_{i,j} |)} \right\}/\Sigma_{{\text{j}}} |a_{i,j} |, $$

(22)

$$ \tau_{{\text{j}}} = \left\{ {\left( {v_{{\text{j}}} {-}\Sigma_{{\text{i}}} a_{i,j} } \right){-}\Sigma_{{\text{i}}} (\lambda_{{\text{i}}} \cdot |a_{i,j} |)} \right\}/\Sigma_{{\text{i}}} |a_{i,j} |. $$

(23)

‘normal equations’ which are similar (apart from using the |a_i,j| absolute values instead of the a_i,j -s is equivalent) to Eq. (21) of Deming and Stephan (1940) or Eqs. (6) and (7) of Friedlander (1961) and can be solved as a system of linear equations. Since in this (M = 0) case λ_i and τ_j depend only on each other we have to compute x_i,j from (20) only after the solution for λ_i and τ_j is found.

Now take S = |A|, w = 1^TS, q = S1, and let us redefine ${\mathbf{R}} = {\hat{\mathbf{q}}}^{{ - {1}}} {\mathbf{S}}$ and ${\mathbf{C}} = {\mathbf{S}}{\hat{\mathbf{w}}}^{{ - {1}}}$; where R and C are matrices containing the row- and column-wise shares (structure) of S.

Denote g_i = u_i − Ʃ_j a_i,j and h_j = v_j − Ʃ_i a_i,j the differences of the prescribed row and column totals from those of the matrix A.

Using the recently introduced notations and by multiplying Eqs. (22) and (23) by q_i = Ʃ_j|a_i,j| and w_j = Ʃ_i|a_i,j| respectively we get their

$$ \lambda_{{\text{i}}} \cdot q_{i} = g_{i} {-}\Sigma_{{\text{j}}} (\tau_{{\text{j}}} \cdot s_{i,j} ) $$

(24)

$$ \tau_{{\text{j}}} \cdot w_{j} = h_{j} {-}\Sigma_{{\text{i}}} (\lambda_{{\text{i}}} \cdot s_{i,j} ) $$

(25)

simpler versions. With matrix algebraic notations (24) and (25) can be combined into the

$$ \left[ {\begin{array}{*{20}c} {{\hat{\mathbf{q}}}} & {\mathbf{S}} \\ {{\mathbf{S}}^{{\mathbf{T}}} } & {{\hat{\mathbf{w}}}} \\ \end{array} } \right] \left[ {\begin{array}{*{20}c} {{\varvec{\uplambda}}} \\ {{\varvec{\uptau}}} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mathbf{g}} \\ {\mathbf{h}} \\ \end{array} } \right] $$

(26)

system of inhomogeneous linear equations, where λ, τ, g and h are the column vectors containing the λ_i, τ_j, g_i and h_j elements respectively.

Since 1^Tg = 1^Th, $ {\hat{\mathbf{q}}} {\mathbf{1}} = {\mathbf{q}} = {\mathbf{S1}}$ and ${\hat{\mathbf{w}}\mathbf{1}} = {\mathbf{S}}^{{\text{T}}} {\mathbf{1}}$, therefore the following holds:

$$ \left[ {\begin{array}{*{20}c} {{\hat{\mathbf{q}}}} & {\mathbf{S}} \\ {{\mathbf{S}}^{{\mathbf{T}}} } & {{\hat{\mathbf{w}}}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} 1 \\ { - 1} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \right] $$

(27)

Since (27) can be interpreted as a solution of a homogenous system of linear equations, its $\left[\begin{array}{cc}\widehat{\mathbf{q}}& \mathbf{S}\\ {\mathbf{S}}^{\mathbf{T}}& \widehat{\mathbf{w}}\end{array}\right]$ (symmetric) coefficient matrix (denoted subsequently by S^*) is singular (i.e. its rows/columns are linearly interdependent). Therefore the (26) system of linear equations cannot be solved by multiplying it from the left by the (non-existent) inverse of the S^* matrix. Instead, one must set one variable exogenously and the corresponding equation must be dropped. Finally, the reduced set of linear equations (which contains (m + n − 1) equations and the same number of variables) can be solved by multiplying it from the left by the inverse of the reduced coefficient matrix.

4 The iteration algorithm for solving the INSD-model

4.1 The solution algorithm suggested by the related literature

Regarding the iteration algorithm suggested by Huang et al. (2008) and Temushoev et al. (2011) and following their advice to initialize the z_i,j, λ_i and τ_j variables by z_i,j⁽⁰⁾ = 1, λ_i⁽⁰⁾ = 0, τ_j⁽⁰⁾ = 0 and substituting these values into Eqs. (22) and (23) recursively (so that the λ_i -s computed from (22) are substituted into (23) to compute the τ_j -s) the first step of the iteration would be the following:

$$ \lambda_{i}^{{\left( {1} \right)}} = g_{i} /\Sigma_{{\text{j}}} |a_{i,j} | $$

(28)

$$ \tau_{{\text{j}}}^{{\left( {1} \right)}} = \left\{ {h_{j} {-}\Sigma_{{\text{i}}} (g_{i} /(\Sigma_{{\text{j}}} |a_{i,j} |) \cdot |a_{i,j} |)\} /\Sigma_{{\text{i}}} |a_{i,j} | = \{ h_{j} {-}\Sigma_{{\text{i}}} (g_{i} r_{i,j} )} \right\}/\Sigma_{{\text{i}}} |a_{i,j} | $$

(29)

formula, where the numerator is just the residual column discrepancy remaining after the first row-wise adjustment. Therefore

$$ |a_{i,j} | \cdot \tau_{{\text{j}}}^{{\left( {1} \right)}} = \left\{ {h_{j} {-}\Sigma_{{\text{i}}} (g_{i} r_{i,j} )} \right\} \cdot c_{i,j} $$

(30)

represents the changes made by the first column-wise adjustment.

It is easy to see from Eqs. (22) and (23) and to prove by mathematical induction that in all further iteration steps λ and τ also represent the percentage row-wise and column-wise residual adjustment requirements respectively (remaining after the previous adjustments or in other words applying the (28) and (29) formulas of the ‘first’ iteration but after ‘reinitializing’ the a_i,j, c_i,j, r_i,j, g_i and h_j parameters).

By reconsidering the meaning of Eqs. (18), (22) and (23) in the light of the just analysed iteration algorithm we can say that in the n-th iteration for each pair of (i,j) indices the percentage change in the corresponding matrix element (z_i,j) is the sum of the percentage change in the corresponding row- and column-totals still required after the first n − 1 iterations, minus the weighted average of these required row-total changes weighted by the reinitialized c_i,j shares.

4.2 The iteration algorithm of the pure INSD-model

In the early ‘90s Révész (2001) developed and used an algorithm (called at that time perhaps somewhat misleadingly ‘additive-RAS’) instead of the RAS in the case of zero (or close to zero) known (target) margins or negative reference matrix elements. In the first step this additive-correction algorithm for each row distributes the difference of the target row total and the corresponding row-total of the reference matrix proportionately to their row-wise absolute value share (as matrix R was defined above) according to the

$$ x_{i,j}^{{\left( {1} \right)\left( {\text{r}} \right)}} = a_{i,j} + g_{i}^{{\left( {1} \right)}} \cdot r_{i,j} $$

(31)

formula, where g_i⁽¹⁾ = g_i. Then a similar adjustment has to be done column-wise according to the

$$ x_{i,j}^{{\left( {1} \right)}} = x_{i,j}^{{\left( {1} \right)({\text{r}})}} + h_{j}^{{\left( {1} \right)}} \cdot c_{i,j} $$

(32)

formula, where h_j⁽¹⁾ = v_j − Ʃ_i x_i,j^(1)(r).

Theorem 1

The result of the additive-correction algorithm is identical to that of the suggested iterative solution algorithm of the INSD-model.

Proof

Considering (28), (29) and the definition of their categories, it is obvious that the first step of the INSD model’s iteration algorithm suggested by Huang et al. (2008) is absolutely the same as that of the additive-correction algorithm.

In general, the n-th iteration (i.e. which contains the n-th row-wise and n-th column-wise adjustment) is

$$ x_{i,j}^{{\left( {\text{n}} \right)\left( {\text{r}} \right)}} = x_{i,j}^{{({\text{n}} - {1})}} + g_{i}^{{\left( {\text{n}} \right)}} \cdot r_{i,j} $$

(33)

(where g_i⁽ⁿ⁾ = u_i − Ʃ_j x_i,j⁽ⁿ⁻¹⁾) and

$$ x_{i,j}^{{\left( {\text{n}} \right)}} = x_{i,j}^{{\left( {\text{n}} \right)({\text{r}})}} + h_{j}^{{\left( {\text{n}} \right)}} \cdot c_{i,j} $$

(34)

where h_j⁽ⁿ⁾ = v_j − Ʃ_i x_i,j^(n)(r). Based on this the total change in the individual elements, caused by the first n iteration (d_i,j⁽ⁿ⁾ = x_i,j⁽ⁿ⁾ − a_i,j) is

$$ d_{i,j}^{{\left( {\text{n}} \right)}} = \sum\limits_{k = 1}^{n} {\left( {g_{i}^{\left( k \right)} \cdot r_{i,j} { } + h_{j}^{\left( k \right)} \cdot c_{i,j} } \right)} = r_{i,j} \mathop \sum \limits_{k = 1}^{n} g_{i}^{\left( k \right)} + c_{i,j} \mathop \sum \limits_{k = 1}^{n} h_{j}^{\left( k \right)} $$

(35)

If the process converges then obviously its d_i,j^(∑) = $\underset{n\to \infty }{\mathrm{lim}}{d}_{i,j}^{(n)}$ limit value can be computed as

$$ d_{i,j}^{( \Sigma)} = r_{i,j} \cdot g_{i}^{( \Sigma)} + c_{i,j} \cdot h_{j}^{( \Sigma)} $$

(36)

where g_i^(∑) = $\underset{n\to \infty }{\mathrm{lim}}{g}_{i}^{(\sum ,n)}:=\underset{n\to \infty }{\mathrm{lim}}\sum_{k=1}^{n}{g}_{i}^{(k)}$ and h_j^(∑) = $\underset{n\to \infty }{\mathrm{lim}}{h}_{j}^{(\sum ,n)}:=\underset{n\to \infty }{\mathrm{lim}}\sum_{k=1}^{n}{h}_{j}^{(k)}$.

Since the d_i,j^(∑) elements of the ‘final’ matrix should satisfy the row-total and column-total requirements (otherwise the adjustment process would continue by distributing the remaining discrepancy), summing the equations of (36) by j we get the following:

$$ g_{i} = \Sigma_{{\text{j}}} d_{i,j}^{(\sum )} = \Sigma_{{\text{j}}} (r_{i,j} \cdot g_{i}^{( \Sigma)} + c_{i,j} \cdot h_{j}^{( \Sigma)} ) \, = g_{i}^{( \Sigma)} \cdot \Sigma_{{\text{j}}} (r_{i,j} + c_{i,j} \cdot h_{j}^{( \Sigma)} ) = g_{i}^{( \Sigma)} + \Sigma_{{\text{j}}} (c_{i,j} \cdot h_{j}^{( \Sigma)} ) $$

(37)

Similarly summing the equations of (36) by i we get the

$$ h_{j} = \Sigma_{{\text{i}}} d_{i,j}^{(\sum )} = \Sigma_{{\text{i}}} (r_{i,j} \cdot g_{i}^{( \Sigma)} + c_{i,j} \cdot h_{j}^{( \Sigma)} ) \, = \Sigma_{{\text{j}}} (r_{i,j} \cdot g_{i}^{( \Sigma)} ) \, + h_{j}^{( \Sigma)} \cdot \Sigma_{{\text{i}}} c_{i,j} = \Sigma_{{\text{j}}} (r_{i,j} \cdot g_{i}^{( \Sigma)} ) \, + h_{j}^{( \Sigma)} $$

(38)

conditions for the so far unknown g_i^(∑) and h_j^(∑) values. Equations (37) and (38) can be described in matrix algebraic notations as

$$ {\mathbf{g}} = {\mathbf{g}}^{( \Sigma)} + {\mathbf{Ch}}^{( \Sigma)} = \hat{\mathbf{q}}\hat{\mathbf{q}}^{{-{1}}} {\mathbf{g}}^{( \Sigma)} + {\mathbf{S}}\hat{\mathbf{w}}^{{ - {1}}} {\mathbf{h}}^{( \Sigma)} $$

(39)

$$ {\mathbf{h}} = {\mathbf{R}}^{{\text{T}}} {\mathbf{g}}^{( \Sigma)} + {\mathbf{h}}^{( \Sigma)} = {\mathbf{S}}^{{\text{T}}} {\hat{\mathbf{q}}} ^{{ - {1}}} {\mathbf{g}}^{( \Sigma)} + {\hat{\mathbf{w}}\hat{\mathbf{w}}}^{{ - {1}}} {\mathbf{h}}^{( \Sigma)} $$

(40)

respectively, where g^(∑) and h^(∑) mean the column vectors containing the elements of g_i^(∑) and h_j^(∑) respectively. Equations (39) and (40) can be combined in the

$$ \left[ {\begin{array}{*{20}c} {{\hat{\mathbf{q}}}} & {\mathbf{S}} \\ {{\mathbf{S}}^{{\text{T}}} } & {{\hat{\mathbf{w}}}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\hat{\mathbf{q}}}^{ - 1} {\mathbf{g}}^{( \Sigma)} } \\ {{\hat{\mathbf{w}}}^{ - 1} {\mathbf{h}}^{( \Sigma)} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mathbf{g}} \\ {\mathbf{h}} \\ \end{array} } \right] $$

(41)

system of inhomogeneous linear equations.

Comparing this with (26) we can see that both the coefficient matrices and the right-hand-side constant vectors are the same as their counterpart in (26) and (41). Therefore, the solutions of the (26) and (41) set of linear equations are the same too. This means that if λ, τ are the solution of (26), then those g^(∑) and h^(∑) vectors which satisfy the equations ${\hat{\mathbf{q}}}^{{ - {1}}} {\mathbf{g}}^{( \Sigma)} = {{\varvec{\uplambda}}}$ and ${\hat{\mathbf{w}}}^{{ - {1}}} {\mathbf{h}}^{( \Sigma )} = {{\varvec{\uptau}}}$, i.e. which can be computed as

$$ {\mathbf{g}}^{( \Sigma)} = {\hat{\mathbf{q} }}\lambda $$

(42)

$$ {\mathbf{h}}^{( \Sigma)} = \hat{\mathbf{{w}}}\lambda $$

(43)

are the solutions of (41). By substituting (42) and (43) into (36) we obtain the

$$ d_{i,j}^{( \Sigma)} = r_{i,j} \cdot q_{i} \cdot \lambda_{{\text{i}}} + c_{i,j} \cdot w_{j} \cdot \tau_{{\text{j}}} = s_{i,j} \cdot \lambda_{{\text{i}}} + s_{i,j} \cdot \tau_{{\text{j}}} = |a_{i,j} | \cdot \left( { \, \lambda_{{\text{i}}} + \, \tau_{{\text{j}}} } \right) $$

(44)

formula for the resulting total changes (in the individual matrix elements) of the additive-correction algorithm. This is just the same as (21), the solution of the INSD-model.

4.3 Illustration of the additive correction iteration algorithm

During its more than 25-years use for various matrix adjustment problems of the calibration of the multisectoral models the additive-correction method mostly converged fast and the resulting matrix usually fit well to the reference matrix. To illustrate this, we present not one of our exercises, usually made with large matrices, but with the numerical examples of Huang et al (2008) and Lemelin (2009) instead.

The numerical test confirmed that the additive-correction algorithm produces the same result as what Huang et al. (2008) published as optimal solution of the INSD-model, which they had found to be the estimation method with the best fit in terms of the AIL (average information loss) measure (which was computed to be 11.28).

Although in the case of sign-flips we know little more of the mathematical characteristics of the additive-correction and INSD algorithms than what is said in Huang et al. (2008) the additive-correction algorithm yielded the following quite reasonable estimates (see in Table 3) for the (somewhat extreme) numerical example of Table 1.

Table 3 Additive-correction estimates for the matrix of international investment positions

Full size table

To evaluate the results, we have to bear in mind that the INSD-model (and hence its additive-correction algorithm) is not (as all least squares based models are not) sign-preserving (see the entries where sign changes occurred in bold italics). Furthermore, now the INSD-model produces some seemingly weird results: the sign of all elements of the 2^nd row have changed! At the first glance it seems to be quite odd, given that its original row total was already zero, therefore no row-wise adjustments (not to say sign-flips) were required. However, after more careful inspection one can realize that each column where in the 2^nd row there was a negative element the column totals had to increase tremendously, and where there were positive elements the column totals had to decrease tremendously even change sign. 'To add insult to injury' 3 column totals were expected to change sign!

Apart from these cases one can see that almost all elements have changed in the right direction (i.e. to eliminate the discrepancy between the target and actual margins) and the magnitudes of the individual (cell) changes are also reasonable.

Comparing our results with those of Lemelin (see Table 2 above or in Table 8 in Lemelin 2009) one can see that the different methods reacted to this controversial situation differently: The INSD-model tended to believe that the required sign-flips of all but one column totals meant that sign-flips are also welcome in those columns. On the other hand, the K-L method fully exploited the fact that in each row- and column of the initial matrix there was at least one negative element and at least one positive element. Therefore, it duly made responsible the negative elements for absorbing most of the required decreases and tended to allocate the required increases to the positive elements.

To compare the fit of the above estimates of the INSD-model and the K-L method of Lemelin the MAD (mean average deviation) statistics were computed. Although in this respect the INSD-model performed somewhat/significantly better (MAD_INSD = 4.583, MAD_K-L = 5.357), this could be expected from the facts that the MAD is similar to the minimand of the INSD-model and the K-L measure is sign-preserving (which amounts to additional, although implicit constraints). Therefore, it only shows that the two methods produce such matrix estimates, the MAD indicator of which is somewhere in the normal range. The MARD (mean average relative deviation) statistics were computed. As one could expect, the K-L measure based model performed significantly better than the INSD-model (MARD_INSD = 1.17, MARD_K-L = 0.47). Although it would be tempting to compare the results of the two models by some information loss measure too, as noted earlier, this is not applicable for such models which produce sign-flips.

Taking everything into account (also realizing the weird nature of the figures of the example) I think the true issue could be to investigate that in which circumstances one method is more usable/reliable and in which circumstances the other (or a third) one (Table 3).

5 Testing the INSD-model on a practical numerical example

Finally, we present the performance of the additive-correction algorithm (and of the INSD-model) in the case of an almost everyday problem. The problem is to estimate the various incomes and expenditures of given social groups by having a reference matrix (e.g. from household budget surveys) and knowing the corresponding totals of the whole household sector (as found in the national accounts). If we account the savings of the households as expenditure and account the incomes as positive while the expenditures as negative variables, the problem can be illustrated by Table 4.

Table 4 Initial matrix of the budget of the various household groups and prescribed margins

Full size table

The results of the additive-correction is shown in Table 5.

Table 5 Estimated matrix of the budget of the various household groups

Full size table

As one can see that although the rows and columns had to be adjusted quite extremely, only 2 elements of the estimated matrix show sign-flips. Both belong to the column of savings (in lines ‘Group#1 and ‘Group#2’), and just this is the category where such sign-changes one may expect. It is also quite reasonable that the required upward adjustment of the total transfers took place almost exlusively in line Group#3, where the row sum had to be increased as well.

Clearly, the above example deserves further analysis and comparison with the results of alternative methods. Due to size limits here we may say only that the INSD-model and its additive-correction algorithm is likely to perform even better in less extreme circumstances, characteristic of real-life problems.

6 Conclusion

The two-directional matrix adjustment methods can be applied in a growing number of areas, due to the precise formulation of the problem, the explored mathematical properties of the proposed methods, the development of computing (more efficient solving software), the accumulated international experience and the improved statistical data (which make it possible to produce a better reference matrix). Nevertheless, the most important conditions for finding the most suitable method and its successful application are the deep knowledge about the economic phenomena under investigation and about the statistical methodology behind the construction of the related reference matrix.

Obviously, many mathematical properties and relationships of these matrix adjustment models must be clarified yet. In the present article I focused on the mathematical characteristics of the matrix adjustment problem when some of the elements or margins are negative. I found the additive correction algorithm of the INSD-model to be identical to the iteration procedure suggested by Huang et al. (2008) in the special, not sign-preserving case. It was also demonstrated that this simplified algorithm produces a solution which fits well to the reference matrix and in which the elements of the matrix are adjusted in the expected direction.

Although powerful modern computers can solve such matrix adjustment problems formulated as nonlinear mathematical programming (optimization) problems, in such solutions it is not transparent how the results are related to the parameters of the model. This makes the further analysis of the results and the further development of the model harder. As opposed to this, using transparent iteration algorithms may reveal the role of certain parameters and constraints more clearly, especially by studying the results of the individual iteration steps more thoroughfully.

In general, further research is needed to clarify the mathematical properties of its iteration algorithm and related algorithms too. For example, the modification of the INSD model’s above presented iteration algorithm so that the row- and column-discrepancies be dissipated not pro-rata of the absolute value of the given elements of the initial matrix, but pro-rata of the absolute value of the given elements of the k-th iteration of the estimated matrix, is also worth investigating. The algorithm can be extended also to the estimate of 3 or more dimensional arrays in a similar way as it is done with the RAS and GRAS algorithm (Holý and Šafr 2020; Valderas-Jaramillo and Rueda-Cantuche 2021).

In any case, such matrix adjustment methods can be applied not only in isolation, but also sequentially and built into more complex mathematical programming problems.

Notes

Indecomposable means here that the matrix is not bloc-diagonal. Bacharach (1970, p. 47) says ‘connected’. Indecomposable matrices are also referred to as irreducible matrices.
In this paper we mean by sign-preservation that if a_i,j < 0 then x_i,j < 0 and if a_i,j > 0 then x_i,j > 0.
Although if the entropy model is formulated in terms of a bi-dimensional distribution, then the Kullback – Leibler (1951) cross-entropy measure (despite the fact that there must be some x_i,j < a_i,j cases) is still non-negative as demonstrated by Theil (1971, p. 644, Problem 6.2.).
Originally in their paper the ln(r_i) [r_i (u_i –$\sum_{j\, = 1}^{n}$ x_i,j)] and ln(s_j) [s_j (v_j –$\sum_{i = 1}^{m}$ x_i,j)] terms are written, but these are wrong, the r_i the and s_j factors have to be omitted, as pointed out by André Lemelin in a comment written to me. For a confirmation, see Temurshoev et al. (2011 p. 100).
The function zln z has its minimum at the value z = 1/e. Since in our case and z denotes the x_i,j/a_i,j ratios the minimum point should be z = 1. Function zln(z/e) proposed by Lenzen et al. (2007) indeed has its minimum at z = 1.
In the Table 1 entries are written in italics indicating that in the corresponding Table 6 in Lemelin (2009) their minus sign were erroneously omitted (presumably a copy paste error in which his Table 1 was copied to Table 6 instead of Table 4). Row- and column totals are also corrected accordingly.
To make consistent with the notations of my paper in the quoted text I replaced Lecomber’s _ox_i,j notation by a_i,j.

References

Bacharach M (1965) Estimating non-negative matrices from marginal data. Int Econ Rev 6:294–310
Article Google Scholar
Bacharach M (1970) Biproportional matrices and input-output change. Cambridge University Press, Cambridge, UK
Google Scholar
Byron RP (1978) The estimation of large social account matrices. J Roy Stat Soc Ser A 141(Part 3):359–367. https://doi.org/10.2307/2344807
de Mesnard L (1990) Biproportional method for analysing interindustry dynamics: the case of France. Econ Syst Res 2(3):271–293. https://doi.org/10.1080/09535319000000019
Article Google Scholar
de Mesnard L (2011) Six matrix adjustment problems solved by some fundamental theorems on biproportion. Working paper, University of Burgundy and CNRS. https://doi.org/10.2139/ssrn.1692512
Deming WE, Stephan FF (1940) On a least-squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann Math Stat 11(4):427–444. https://doi.org/10.1214/aoms/1177731829
Article Google Scholar
Friedlander D (1961) A technique for estimating contingency tables, given marginal totals and some supplemental data. J R Stat Soc Ser A 124(3):412–420. https://doi.org/10.2307/2343244
Article Google Scholar
Günlük-Şenesen G, Bates JM (1988) Some experiments with methods of adjusting unbalanced data matrices. J Roy Stat Soc Ser A 151(3):473–490. https://doi.org/10.2307/2982995
Henry EW (1973) Relative efficiency of RAS versus least squares methods of updating input-output structures, as adjudged by application to Irish data. Econ Social Rev 5(1):7–29. v5n11973_2.pdf(tcd.ie)
Henry EW (1974) Relative efficiency of RAS versus least squares methods of updating input–output structures: an addendum. Econ Social Rev 5(2):175–179. v5n21974_2.pdf(tcd.ie)
Holý V, Šafr (2020) Disaggregating input–output tables by the Mmultidimensional RAS method. Working paper, University of Economics, Prague, arXiv:1704.07814 [stat. AP]
Huang W, Kobayashi S, Tanji H (2008) Updating an input-output matrix with sign-preservation: some improved objective functions and their solutions. Econ Syst Res 20(1):111–123. https://doi.org/10.1080/09535310801892082
Article Google Scholar
Jackson RW, Murray AT (2004) Alternative input-output matrix updating formulations. Econ Syst Res 16(2):135–148. https://doi.org/10.1080/0953531042000219268
Article Google Scholar
Junius T, Oosterhaven J (2003) The solution of updating or regionalizing a matrix with both positive and negative entries. Econ Syst Res 15(1):87–96
Article Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–96. https://doi.org/10.1214/aoms/1177729694
Article Google Scholar
Lahr M, de Mesnard L (2004) Biproportional techniques in input-output analysis: table updating and structural analysis. Econ Syst Res 16(2):115–134
Article Google Scholar
Lecomber R (1971) A critique of methods of adjusting, updating and projecting matrices, together with some new proposals. Discussion Paper in Economics, No. 40, Department of Economics, University of Bristol, August 1971.
Lecomber JRC (1975) A critique of methods of adjusting, updating and projecting matrices. In: Allen RIG, Gossling WF (eds) Estimating and projecting input-output coefficients. Input-Output Publishing Company, London, UK, pp 1–25
Lemelin A (2009) A GRAS variant solving for minimum information loss. Econ Syst Res 21(4):399–408
Article Google Scholar
Lenzen M, Moran D, Geschke A, Keiichiro K (2014) A non-sign preserving GRAS-variant. Econ Syst Res 26(2):197–208
Article Google Scholar
Lenzen M, Wood R, Gallego B (2007) Some comments on the GRAS method. Econ Syst Res 19(4):461–465. https://doi.org/10.1080/09535310701698613
Article Google Scholar
MacGill SM (1977) Theoretical properties of biproportional matrix adjustments. Environ Plan A 9:687–701. https://doi.org/10.1068/a090687
Article Google Scholar
McDougall RA (1999) Entropy theory and RAS are friends. GTAP working papers 300, Center for Global Trade Analysis, Department of Agricultural Economics, Purdue University (see at http://ideas.repec.org/p/gta/workpp/300.html)
Möhr M, Crown WH, Polenske KR (1987) A linear programming approach to solving infeasible RAS problems. J Reg Sci 27(4):587–603. https://doi.org/10.1111/j.1467-9787.1987.tb01183.x
Article Google Scholar
Oosterhaven J (2005) GRAS versus minimizing absolute and squared differences: a comment. Econ Syst Res 17(3):327–331. https://doi.org/10.1080/09535310500221864
Article Google Scholar
Pearson K (1904) On the theory of contingency and its relation to association and normal correlation. part of the Drapers' Company research memoirs, biometric series I.—mathematical contribution to the theory of evolution. London, Dulau and Co. https://archive.org/details/cu31924003064833/page/n3/mode/2up
Révész T (2001) Költségvetési és környezetpolitikák elemzése általános egyensúlyi modellekkel (Fiscal and Environmental Policy Analyses Using General Equilibrium Models). Budapest University of Economic Sciences, Ph.D. Thesis (written in Hungarian with English summary), 2001 March (http://phd.lib.uni-corvinus.hu/1085/)
Smith JH (1947) Estimation of linear functions of cell proportions. Ann Math Stat 18(2):231–254. https://doi.org/10.1214/aoms/1177730440
Article Google Scholar
Stone R (1977) The development of economic data systems. In: Pyatt G, Stone R, Roe AR (eds) Social accounting for development planning. Cambridge University Press, New York (Foreword)
Temurshoev U, Miller RE, Bouwmeester MC (2013) A note on the GRAS method. Econ Syst Res 25(3):361–367. https://doi.org/10.1080/09535314.2012.746645
Article Google Scholar
Temurshoev U, Webb C, Yamano N (2011) Projection of supply and use tables: methods and their empirical assessment. Econ Syst Res 23(1):91–123. https://doi.org/10.1080/09535314.2010.534978
Article Google Scholar
Theil H (1971) Principles of econometrics. Wiley, New York
Google Scholar
Valderas-Jaramillo JM, Rueda-Cantuche JM (2021) The multidimensional nD-GRAS method: applications for the projection of multiregional input–output frameworks and valuation matrices. Pap Reg Sci 100(6):1599–1624. https://doi.org/10.1111/pirs.12625
Article Google Scholar

Download references

Funding

Open access funding provided by Corvinus University of Budapest.

Author information

Authors and Affiliations

Institute for Economics, Corvinus University of Budapest, Budapest, Hungary
Tamas Revesz

Authors

Tamas Revesz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tamas Revesz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Revesz, T. A not sign-preserving iteration algorithm for the ‘Improved Normalized Squared Differences’ matrix adjustment model. Cent Eur J Oper Res 31, 49–71 (2023). https://doi.org/10.1007/s10100-022-00799-0

Download citation

Accepted: 29 March 2022
Published: 04 May 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10100-022-00799-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A not sign-preserving iteration algorithm for the ‘Improved Normalized Squared Differences’ matrix adjustment model

Abstract

Similar content being viewed by others

On the Weighted Total Least Squares Solutions

Alternative formulae for robust Weighted Total Least-Squares solutions for Errors-In-Variables models

Some matrix equations involving the weighted geometric mean

1 Introduction

2 The main types of objective functions of matrix adjustment

2.1 The entropy models

2.2 Models with power functions

3 Treatment of negative elements in the matrix adjustment problems

3.1 Simple treatment of negative elements in matrix adjustment methods

3.2 Treatment of negative elements in entropy-models

Property

3.3 Treatment of negative elements in models with quadratic objective function

3.4 The Improved Normalized Squared Differences model

4 The iteration algorithm for solving the INSD-model

4.1 The solution algorithm suggested by the related literature

4.2 The iteration algorithm of the pure INSD-model

Theorem 1

Proof

4.3 Illustration of the additive correction iteration algorithm

5 Testing the INSD-model on a practical numerical example

6 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A not sign-preserving iteration algorithm for the ‘Improved Normalized Squared Differences’ matrix adjustment model

Abstract

Similar content being viewed by others

On the Weighted Total Least Squares Solutions

Alternative formulae for robust Weighted Total Least-Squares solutions for Errors-In-Variables models

Some matrix equations involving the weighted geometric mean

1 Introduction

2 The main types of objective functions of matrix adjustment

2.1 The entropy models

2.2 Models with power functions

3 Treatment of negative elements in the matrix adjustment problems

3.1 Simple treatment of negative elements in matrix adjustment methods

3.2 Treatment of negative elements in entropy-models

Property

3.3 Treatment of negative elements in models with quadratic objective function

3.4 The Improved Normalized Squared Differences model

4 The iteration algorithm for solving the INSD-model

4.1 The solution algorithm suggested by the related literature

4.2 The iteration algorithm of the pure INSD-model

Theorem 1

Proof

4.3 Illustration of the additive correction iteration algorithm

5 Testing the INSD-model on a practical numerical example

6 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation