A not sign-preserving iteration algorithm for the ‘Improved Normalized Squared Differences’ matrix adjustment model

Estimating the elements of a matrix, when only the margins (row and column sums) are known, but a supposedly similar ‘reference matrix’ is available, is a standard problem in many disciplines. After discussing the main types, issues and applications of these two-directional matrix adjustment problems the paper concentrates on the case of negative matrix elements and models with quadratic objective functions. The solution of the Improved Normalized Squared Differences (INSD) model is proved to be the same as the result of that iteration algorithm which is presented in the paper. It is also argued that if the sign-preservation requirement is dropped then the iteration procedure suggested by Huang et al. (Econ Syst Res 20(1):111–123, 2008) boils down to the same algorithm. Using the numerical example of the earlier literature it is also demonstrated that even in this not sign-preserving case, which even requires sign-flips for some elements, the INSD-model produces good fit in mathematical terms.


Introduction
Estimating the elements of a matrix, when only the margins (row and column sums) are known, is a standard problem in many disciplines. The matrix adjustment problem most commonly discussed in the literature, which we will refer to as the two-directional matrix adjustment problem, can be formulated as follows: Let X * be an m x n unknown matrix, for which row and column sums are equal to the known u column vector and v column vector respectively (that is, X * 1 u, 1 T X * v, where 1 is the summation (column) vector and the T superscript denotes transpose).
If certain cells of X * are also known then by subtracting them from the(ir) corresponding row and column sums, the problem can be converted to the case of unknown cells. However, in most cases we have at most only indirect information about the value of the elements of the matrix. Generally, this indirect information is a reference matrix, for which we assume that it is some sense 'similar' to the target matrix. If we denote such a m x n reference matrix by A (also known as prior), then X * can be estimated with the matrix (also of the size m x n) denoted by X, which in the given sense is the most similar to the reference matrix A and which has row sums equal to the column vector u and column sums equal to row vector v (i.e., X1 u, 1 T X v).
Of course, depending on the definition and given formula (measure) of the 'similarity' (or the 'deviation' or 'distance') of the two matrices (A and X), it is possible that the problem has several solutions (X) for which this formula gives the same value. However, if A is indecomposable, the set of possible solutions is compact, and the objective function can be continuously differentiated over that set, there is only one solution, which is true for the methods involved here (de Mesnard 2011). 1 In any case, the matrix adjustment task can be defined as a mathematical programming problem, where the goal is to find the optimal value of the target function (the maximum of the similarity formula or the minimum of a monotonous increasing function of the deviation), subject to the constraints X1 u and 1 T X v (and possibly some nonnegativity or sign-preservation conditions 2 ). Such equalities constrained optimization problems can be solved by the Lagrange multiplier method. By composing the Lagrangian function, differentiating it by each variables (i.e. the unknown elements of the matrix and the Lagrange multipliers) and setting the resulting formulas to zero, we get the so called normal equations. In the case of the most well-known objective functions these normal equations can be formulated as a functional relationship between the estimated matrix (as dependent variable), the reference matrix and the Lagrange multipliers (independent variables). However, the Lagrangian multipliers are also variables to be determined only by the normal equations. Therefore, the solution of the normal equations would require a different formula. In a few cases an explicit formula (for the elements of the estimated matrix) exists. In many other cases the solution can be obtained by an iterative procedure.
The paper is structured as follows: Sect. 2 reviews the main and most widely used types of the constrained matrix adjustment methods. Section 3 gives a short overview of how matrices containing some negative elements were traditionally and in a simple way treated in the matrix adjustment problems. In addition, it describes how the entropymodels, the models with quadratic objective function and in particular the INSD-model deal with the negative elements and with the issues of the sign-preservation. This section also presents the linear algebraic solution of the INSD-model, and its iterative solution algorithm, in particular in the not sign-preserving case. Section 4 shows that when the sign-preservation requirement is omitted from the INSD-model, the iterationn procedure Huang et al. (2008) suggested for solving the INSD model boils down to a simple iteration procedure which the author of this paper have been using for decades and which-following the classification of de Mesnard (1990),-may be called 'additive-correction'-algorithm or perhaps more precisely additive absolute value-shares proportional correction algorithm. By numerical examples Sects. 4.3 and 5 demonstrate the merit of this algorithm and discusses the suggested scope of its applicability. In the Conclusion our findings are put in a broader context by emphasising the merit of having such transparent formulas describing the relationship of the estimated matrix, the reference matrix and the known margins. The importance of trying to obtain as good reference matrices as possible and combining these methods with other matrix adjustment and estimation methods are also emphasized.

The main types of objective functions of matrix adjustment
In the academic literature the matrix adjustment problem appeared about one hundred years ago, first as a matrix filling problem of cross-tabulations (or "contingency tables" as Pearson 1904 named it) of demographic data. In particular, the task was to update such tables (e.g. census data) to given (known, already updated) marginal totals or adjusting sample estimates of contingency matrices to known population values for their marginal totals (Deming and Stephan 1940).
Many researchers from several disciplines developed solution methods to this kind of matrix adjustment or matrix balancing problems, in some cases independently, being unaware of others' achievements (see a review of these in Lahr and de Mesnard 2004). That is why mathematically equivalent methods are called diversely in distinct disciplines.
The deviation of the estimated matrix from such reference matrices can be measured by various functions. In the academic literature two main type of such objective functions were developed: information (entropy) measure-based functions (which contain the logarithm function borrowed from the information theory) and error minimization functions, where the 'error' means the weighted sum of the positive (absolute value or squared) differences (in the latter case these are quadratic objective functions based on the principle of least squares) between the estimated values and their counterparts in the reference matrix.
In the literature a vast discussion emerged about which method and under what circumstances is more efficient and more reliable. According to experience so far, the best choice depends on the mathematical properties of the reference matrix, target margins, and expectations about the target matrix (non-negativity, zero values, sign switching, sparse matrix etc.), and the economic content of the matrix.

The entropy models
One of the earliest and still most popular entropy-model is the so-called RAS-method. Its objective function is the following (see in Bacharach 1970): By using the method of Lagrangian multipliers Bacharach (1970) proved that the solution of (1) subject to

X1 u,
( 2 ) can be expressed as the X rAŝ, (or in algebraic notations : x i, j a i, j · r i · s j for all i and j indices) (4) 'biproportional' formula (from which the model is called the 'RAS-model'), whered enotes the diagonal matrix of a vector, and r and s are the vectors generated from the Lagrangian multipliers of the constraints (2) and (3), respectively. He also showed that if matrix A is indecomposable, then the solution X rAŝ is unique, except for any arbitrary δ and 1/δ scalar multipliers for r and s vectors respectively. This means that if a r and s vector pair is a solution, then the r·δ and s/δ vector pair is also. Bacharach (1970) proved that the above solution of the RAS-model can be achieved by the following iterative proportional fitting procedure (called the RAS-algorithm): Let be the x i,j (0)(r) a i,j , w 1 T A, q A1, R q −1 A, C Aŵ −1 and compute for n 1, 2, … the following 2 formulas: (where g i (n) u i / j x i,j (n−1) ) and formulas, where h j Normally this iteration process converges so that lim n→∞ x (n) i, j a i,j · r i · s j (on the conditions for convergence see McGill 1977;Möhr et al. 1987).
The RAS-algorithm works even when some elements of A are zeros, and then the corresponding elements of X remain also zeros. In this case the solution can be computed by minimizing the m i 1 n j 1 x i, j ln(x i,j /a i,j ) sum subject to (2) and (3), where the summation should be restricted only to those indices where a i,j 0.
From the (5) and (6) formulas of the RAS algorithm it is also obvious that if A, u and v are non-negative then the estimated X will be also non-negative (a formal proof can be found in Bacharach 1965). In general, if a i,j 0 then x i,j should have the same sign as a i,j (i.e. x i, j a i, j > 0 should hold), otherwise the ln(x i,j /a i,j ) term can not be computed. Note, that even if some elements of A, u and v are negative, technically the RAS-algorithm still may work, but sign changes could occur, though their occurrence is extremely unlikely if u and v are non-negative (Günlük-S , enesen and Bates 1988).

Models with power functions
From the first mathematical discussion of the conditional matrix adjustment problem in the academic literature (Deming and Stephan 1940) various models with quadratic objective functions were proposed and analysed. In the most general case, the type of power-function can be used, where (in case of stochastic interpretation of the problem) v i,j may represent the variance of the i,j-th element of the X or A matrix depending on which is regarded to be stochastic (see Stone (1977) and Byron (1978), Günlük-S , enesen and Bates 1988). If v i,j a i,j q then (7) can be rewritten as However, in practice this formula is usually applied with the p 2 and q 1 or 0 settings. The p 1 case and the q 0 case have many drawbacks (e.g. the p 1 case requires the computation of the absolute value of the numerator but the resulting | x i,j − a i,j | function is not differentiable, while the q 0 case, even if applied to input-output coefficients thereby eliminating the bias resulting from the different size of the individual industries, still does not address the problem of the tendency to produce large proportional changes to small coefficients-see Lecomber 1975), therefore we deal only with the most common case when p 2 and q 1.
In this case the objective function can be written as follows: If in the case of contingency tables x i"j and a i,j represent two-dimensional probability distributions then the half of the above expression is the Pearson's χ 2 -statistic (see a detailed discussion of this stochastic interpretation in Smith 1947).
By introducing the z i,j x i,j /a i,j ratios (9) can be rewritten as the weighted sum of the squared sum of the percentage deviation of the x i,j elements from the corresponding a i,j elements: This objective function is attractive partly because the expression m i 1 n j 1 (x i, j −a i, j ) 2 in (9) can be interpreted as the square of the Euclidean distance of the vectorsâ andx, whereâ (a 1,1 , a 1,2 ,…, a 1,n , a 2,1 , a 2,2 ,…, a 2,n ,…, a m,1 , a m,2 ,…, a m,n ) andx (x 1,1 , x 1,2 ,…, x 1,n , x 2,1 , x 2,2 ,…, x 2,n ,…, x m,1 , x m,2 ,…, x m,n ), i.e. they are formed from A and X so that the rows of their respective matrices are rearranged end to end. More importantly from (9a) one can see that the weighting by a i,j represents a reasonable compromise (i.e. opting for a balanced risk of distorting the large or small entries) between not weighting at all the (z i,j − 1) 2 squared percentage differences and the (x i,j − a i,j ) 2 squared absolute differences.
Indeed, this objective function was already used by Deming and Stephan (1940) (p. 429 Eq. (3)) and Friedlander (1961) (p. 414) who duly applied the method of the Lagrangian-multipliers for the conditional optimizing problem consisting of (2), (3) and (9). Without the loss of generality, they halved the (9) objective function and hence derived the following simpler relationship between the optimal value of x i,j and the Lagrangian multipliers: where λ i is the multiplier of the discrepancy of the i-th row and τ j is the multiplier of the discrepancy of the j-th column (i 1, 2, …, m; j 1, 2, …,n).

Simple treatment of negative elements in matrix adjustment methods
By reviewing the academic literature on the margins constrained matrix adjustment methods we can enumerate the following methods regarding the simple treatment of negative elements in the reference matrix: 1. leaving them unchanged (that is if a i,j < 0 then setting x i,j to a i,j , i.e. x i,j a i,j ) 2. setting them to zero (either only in the reference matrix or directly so that x i,j 0) 3. setting the corresponding elements of the X matrix exogenously 4. aggregating the data so that the negative elements be added to more positive elements so that the aggregate figure be non-negative 5. reformulating the problem or introducing 'mirror accounts' (Lenzen et al. 2014) so that negative items disappear (e.g. in commodity balances rearranging the final demand block's negative stock change to positive stock change on the side of the sources as 'decreases in inventory') 6. imposing an explicit x i,j > 0 nonnegativity constraint and solving the model as a linear programming model by any related solvers 7. In the case of estimating a (quadratic) so-called Social Accounting Matrix (SAM), where the row and column margins are the same (u v) if the i,j-th element of the reference matrix is negative (the revenue of the i-th account from the j-th account), then it is treated instead as a negative revenue of the j-th account from the i-th account

Treatment of negative elements in entropy-models
Although the sign-preserving nature of the entropy-models made them popular among researchers dealing with various problems (in particular among economists who tried to update and project input-coefficient matrices) for a long time it prevented the application of the entropy models for matrices with negative element(s). The basic problem of the entropy models is (as pointed out by Günlük-S , enesen and Bates (1988) that if x i,j < 0 in (1) is negative then these 'weights' are 'counterproductive' in the sense that instead of penalizing the corresponding percentage deviation of x i,j from a i,j , they reward such deviations by subtracting the value of the corresponding ln(x i,j /a i,j ) terms from the penalty (objective) function.
Furthermore, if x i,j < a i,j then the ln(x i,j /a i,j ) term can be negative which makes the whole weighting inappropriate and the objective function meaningless. 3 To solve these problems Günlük-S , enesen and Bates proposed the following 'information change' measure as the objective function: Although this allows for negative x i,j and/or a i,j elements but still does not allow for sign flips. Moreover, these authors did not elaborate what algorithm can find its solution. Instead, they present a modified Lagrangian, which may be written correctly 4 as Note, that although they do not explain and it is not clear from the formula, but taking the logarithm of the Lagrangian multipliers is probably meant to harmonize them with the ln(x i,j /a i,j ) term. The solution of the above model is The above treatment of the negative elements expresses the principle that all elements have to be adjusted in the direction to eliminate the given (row-or column-) discrepancy. For example, when r i > 1 (i.e. when the actual row total is less than the prescribed one) then the absolute values of the negative elements have to be decreased by dividing them by r i . This principle we will employ in other models of this paper.
Strangely, in the next 15 years the findings of Günlük-S , enesen and Bates (1988) 'was little acknowledged in subsequent literature on updating input-output tables' so that Junius and Oosterhaven (2003) had to 'rediscover' it (Temurshoev et al. 2011). The objective function of their 'generalized RAS' (GRAS) method was ( 1 4 ) where z i,j x i,j /a i,j , so that z i,j > 0. However, this objective function cannot be applied if not all columns and rows have a positive element (Temurshoev et al. 2013), and even in normal cases it does not always give the best results among the estimation methods available (Lemelin 2009). Later on, Oosterhaven (2005) himself also pointed out that negative and positive differences can cancel each out in the originally proposed objective function (creating the illusion of perfect fit) and instead of it, he proposed the absolute information loss (AIL) function. Later again, Lenzen et al. (2007) corrected the GRAS objective function by replacing the ln(z i,j ) term by ln(z i,j /e), where e is the Euler-number (the base of the natural logarithm). 5 Huang et al. (2008) further changed the objective function to and named this 'improved GRAS' (IGRAS). This they got by adding a constant to the function, with which in the case of z i, j 1 it will be zero, but this does not affect the location of the optimum. In any case, it can be seen from the logarithm in the objective function that if the model has any solution, for every i, j pairs x i,j /a i,j z i,j ≥ 0 holds. Thus, the solution of the GRAS model guarantees that matrix elements preserve their sign. Huang et al. (2008) duly write the associated Lagrangian function of (15) and derive the following optimality conditions: where λ i and τ j are the Lagrangian multipliers belonging to the row-and column sum deviations. Since the exponential function is always positive, from this it also can be seen that the solution of the IGRAS-model must be sign preserving (if exists at all). If we use the r i e λ i and s j e τ j notations, the solution can be rewritten as z i,j r i · s j if a i,j > 0 and z i,j (1/r i )·(1/s j ) if a i,j < 0, much similar to what Günlük-S , enesen and Bates (1988) produced (see Eq. (9)), but of course with different values for r i and s j . From this form of the solution it also can be seen easily that the resulting adjustment treats the positive and negative elements differently, so that both should change in the direction to eliminate the row-and column-discrepancies.
However, r i and s j can be computed only by a complicated iteration process (Junius and Oosterhaven 2003;Temurshoev et al. 2013).
A problem of the entropy models is that the deviation of x i,j from a i,j is not treated symmetrically (as Huang et al. (2008) put it: the objective function is 'biased') and hence the entropy measure can not be interpreted as their 'distance'.
Although models with such logarithmic terms seem to be attractive due to their apparent relationship with information theory, the division of x i,j by a i,j can hardly be interpreted as some entropy measure especially since a i,j can not be regarded to be a realisation of any probability variables. Indeed, there are some attractive (in some cases only partial) interpretations for the chosen objective functions (see for example Bacharach (1970) pp 83-84) but they are debatable and there is not a clear rule which one to accept (see for example McDougall's (1999) critique about the misinterpretation of the concept of 'cross-entropy' by some modelers). Lemelin (2009) compares the GRAS-model with the cross-entropy measure of information loss of Kullback and Leibler (1951), to be referred to as K-L measure. To make it meaningful for negative matrix elements too, he redefines the q i,j 'a priori probabilities' and p i,j 'a posteriori probabilities' in the m i 1 n j 1 p i, j ln p i, j /q i, j minimand so that they be the absolute value shares of the given elements in their grand total: q i,j |a i,j |/ i j |a i,j | and p i,j |x i,j |/ i j |x i,j |. Based on this he observes the following: Property due to Lemelin (2009): The GRAS objective function of Junius and Oosterhaven (2003) is not exact representation of the K-L measure.
He demonstrates this by simple algebraic calculus, taking into account that in the case of negative elements the i j |x i,j | grand total is not fixed. He illustrates the difference by several numerical examples starting from the example of Junius and Oosterhaven (2003). Then it is first modified (by changing the prescribed marginal totals and the sign of some elements) so that it can be interpreted as a net world trade matrix, then by modifying further the prescribed marginal totals appropriately, it is reinterpreted as the matrix of net investment positions, where a i,j represents the net asset of the j-th country from i-th category. In this latter case, the row sums of X must be zero and some of the column sums must be negative. Table 1 shows the initial matrix and the required row and column sums. 6 First, by applying the GRAS-method to these data, he found that the figures of the resulting matrix 'are out of proportion with the initial values' (i.e. which are too large in absolute value). Then by applying the above K-L measure to the same data he obtained the following results (see Table 2 here, and Table 8 in Lemelin 2009). These results will be compared with the results of the INSD model to be discussed subsequently in the present paper.

Treatment of negative elements in models with quadratic objective function
Clearly, quadratic objective functions like (9) can deal with negative elements in the reference matrix and do not prevent sign flips. However, in many applications, i.e. when the reference matrix and/or the matrix to be estimated is non-negative, the modelers regarded the possibility of sign-flips to be a drawback. Henry (1973) and (1974)-while (by formally presenting the related Lagrangian function) reproduces the (7) relationship between the initial and estimated matrix-argues that it is rather unlikely that the 1 + λ i + τ j sum is negative, therefore the (9) objective function also tends to preserve the sign of the matrix elements.
What matters more for us about Henry's second paper that he tells that he had received some ideas and proposals from Lecomber (see Lecomber 1971) regarding the merit of using the absolute value of a i,j in the denominator of the (9) model, which in this case could be written as However, apparently Henry did not realise the importance of this proposal. First he just demonstrates that in the case of some negative a i,j elements the original (9) minimand does not produce the optimal solution (since if a i,j < 0 then the corresponding 1/a i,j > 0 s-order condition is not met) and then just gives a numerical example by which he tries to convince the reader that even in the case of negative a i,j elements the original (9) minimand results in, although not optimal, but saddle-point like solution in which the (only) 'negative entry has received fair and equitable treatment'. In his example it meant that when all margins were doubled then the negative element has also been doubled in the estimated matrix. But just this is what has to be avoided in many cases! Indeed, in the case of any 'net' economic variable that is a residual, such as household savings (income minus consumption expenditures), inventory changes, etc. one has to try to estimate those two categories, whose difference is the given net variable. Such 'net' economic variables tend to be extremely volatile, because the underlying gross flows may change in an uncoordinated manner. So net flows are almost impossible to model; it is largely preferable to model the underlying gross flows. If this is not possible (e.g. due to data unavailability, as in the case of the 'changes in inventories') and the behavior of these categories are rather different, then the matrix adjustment techniques are seldom applicable. However, there are many cases when the 'net' variable behaves reasonably, i.e. in the same direction as the corresponding row-and column-totals.
For example, in the case of devaluation of the domestic currency (which decrease the domestic demand and increases the incentive/pressure to sell abroad) one may expect that-without having detailed information on the changes in the patterns of exports and imports-the trade balances (export minus import) of each commodity improve, either so that the hitherto positive balance increases further, or the originally negative balance shrinks or even turns positive. In other words, the balances do not change proportionately as Henry would regard it to be 'fair' and 'equitable'.
In any case, Lecomber did not seem to be satisfied with Henry's answer and in a book chapter written by him (see Lecomber 1975) he remarks that in the case of negative elements 'The Friedlander minimand given by Henry (his Eq. 3) also breaks down, but a simple modification-replacing a i,j in the denominator by | a i,j |-gives satisfactory answers'. 7 But to support this claim he just gives the following vector-analogue example: 'consider adjusting the vector (− 1, 2) to sum of 4. Pro-rata adjustment (analogous to RAS) gives (− 4, 8), while the modified Friedlander adjustment gives (1, 3). ' The problem of this remark is that Lecomber's example is wrong. The resulting vector is not the (1, 3) (for which the minimand is still 4.5 large) but the (0, 4) (for which the minimand is minimal, i.e. 3). Also note, that to get the 'modified Friedlanderadjustment' solution of such vector-(or matrix) analogous problems, one can use not only the related linear programming model solution algorithms but also the much simpler but so far unnoticed solution algorithm which we are going to discuss in Sect. 4.2.
Unfortunately, Lecomber made his remark only in a footnote and apparently did not impress neither Henry nor others too much.
Sign preservation also can be built into models containing quadratic objective functions. Jackson and Murray (2004), for example, use the following equivalent form of the (17) minimizing criterion (see their Model 10) where z i,j x i,j /a i,j , subject to the usual marginal conditions and the z i,j ≥ 0 nonnegativity constraints. Due to the inequality conditions of this so-called 'sign-preserving squared differences' model it can not be solved using the simpler set of scaling techniques but can be solved only by commercial mathematical programming software packages (Lahr and de Mesnard 2004;Huang et al. 2008). However, in such solutions it is not transparent how the results are related to the parameters of the model. This makes the further analysis of the results and the further development of the model harder.

The Improved Normalized Squared Differences model
Maybe that is why Huang et al. (2008)-who call (17)  (which is positive if z i,j < 0) produces a sufficiently large penalty value to prevent this to happen. We may refer to this model as the Sign-preserving Improved Normalized Squared Differences (SINSD) model. Then they derive the following optimality condition: where λ i and τ j are the Lagrangian multipliers of the row-and column sum deviations. To solve this system of simultaneous nonlinear equations they suggest the following iteration algorithm: Let us initialize z i,j 1, λ i 0, τ j 0, and calculate them with Eqs. (18), (18a) and (18b) iteratively. They claim that finally one gets the solution for z i,j . Although they applied this for a numerical example, the reader could not find out how this iteration process could produce the published results. Only Temurshoev et al. (2011) (who also corrected the sign of the row and total discrepancies in their Lagrangian) corrected the above sequence of the iteration steps by clarifying that first the λ i -s have to be computed from (18a), then these have to be substituted into (18b) to compute the τ j -s and only finally have to be computed the z i,j − s by (18).
Note, that Huang et al. (2008) derive the following alternative optimality condition (see their Eq. (25) which-naturally together with (2) and (3)-can be used instead of (18), (18a) and (18b): As it could be expected from the titles of their articles, Huang et al. (2008) and Temurshoev et al. (2011) were interested in models which are sign-preserving. However, they observe that since the INSD objective function is the first-order Taylor-series approximation of the sign-preserving IGRAS objective function, it also likely produces sign-preserving results.
After discussing reasons for sign-changes in inventories (which products and how often change sign in the related column of the input-output tables), Lenzen et al. (2014) reverse the general negative judgement of the sign-flips and in certain cases regard it to be even desirable.
If in this spirit we remove the (sign-preserving) M parameter from (19) we get the simpler optimality condition for the INSD model, which is equivalent to the z i,j 1 + sgn(a i,j )· (λ i + τ j ) part (case M 0) of (18). It appears from this latter form, that for computing the optimal z i,j , the Lagrange multipliers should be added in the case of positive, and should be subtracted in the case of negative a i,j elements. Note, that it is apparent from formula (20) that it leads to the same result for x i,j when any ϕ scalar is added to λ i and on the other hand subtracted from τ j . replacing λ i + ϕ and replacing τ j with τ j − ϕ, they still satisfy (20). So this means that while in the case of the 'multiplicative' RAS, the degree of freedom of the Lagrangian multipliers is manifested in a proportionality factor, while for the INSD in an additive component. In view of (20) we may call the INSD-model 'A + R|A| +|A|S model', where |A| denotes the matrix containing the absolute values of the elements of A.
Introducing d i,j x i,j − a i,j , subtracting a i,j from each side of (20) we obtain expressing the relation between the Lagrangian multipliers and the (optimal) changes of the elements of the matrix. Note, that although the INSD model is not strictly biproportional, both the row-wise and column-wise adjustments of (changes in) the given elements are proportional to the absolute value of the given entry of the reference matrix. Equation (20) together with (2) and (3) form a system of linear equations with n·m + n + m equations and the same number of variables. However, one equation and variable can be dropped since the conditions of (2) and (3) are linearly interdependent provided they are consistent (as already pointed out by Deming andStephan 1940 anddiscussed further by Friedlander 1961) and since one of the λ i and τ j variables can be chosen arbitrarily (as demonstrated above by showing that if λ i and τ j represent a solution, then λ i + ϕ and τ j − ϕ too, whatever ϕ scalar is used). Therefore, the 'truncated' system of independent linear equations can be solved by any of the known methods.
However, the x i,j , λ i and τ j variables need not be determined simultaneously by such a big system of linear equations. The λ i and τ j variables can be solved first, and then the x i,j variables can be computed easily afterwards (e.g. using Eq. (20)). This is demonstrated by the following considerations: In the not sign-preserving case (i.e. if M 0) Eqs. (18a) and (18b) boil down respectively to the 'normal equations' which are similar (apart from using the |a i,j | absolute values instead of the a i,j -s is equivalent) to Eq. (21) of Deming and Stephan (1940) or Eqs. (6) and (7) of Friedlander (1961) and can be solved as a system of linear equations. Since in this (M 0) case λ i and τ j depend only on each other we have to compute x i,j from (20) only after the solution for λ i and τ j is found. Now take S |A|, w 1 T S, q S1, and let us redefine R q −1 S and C Sŵ −1 ; where R and C are matrices containing the row-and column-wise shares (structure) of S.
Denote g i u i − j a i,j and h j v j − i a i,j the differences of the prescribed row and column totals from those of the matrix A.
Using the recently introduced notations and by multiplying Eqs. (22) and (23) by q i j |a i,j | and w j i |a i,j | respectively we get their system of inhomogeneous linear equations, where λ, τ, g and h are the column vectors containing the λ i , τ j , g i and h j elements respectively.
Since 1 T g 1 T h,q1 q S1 andŵ1 S T 1, therefore the following holds: Since (27) can be interpreted as a solution of a homogenous system of linear equations, its q S S T w (symmetric) coefficient matrix (denoted subsequently by S * ) is singular (i.e. its rows/columns are linearly interdependent). Therefore the (26) system of linear equations cannot be solved by multiplying it from the left by the (non-existent) inverse of the S * matrix. Instead, one must set one variable exogenously and the corresponding equation must be dropped. Finally, the reduced set of linear equations (which contains (m + n − 1) equations and the same number of variables) can be solved by multiplying it from the left by the inverse of the reduced coefficient matrix.

The solution algorithm suggested by the related literature
Regarding the iteration algorithm suggested by Huang et al. (2008) andTemushoev et al. (2011) and following their advice to initialize the z i,j , λ i and τ j variables by 0 and substituting these values into Eqs. (22) and (23) recursively (so that the λ i -s computed from (22) are substituted into (23) to compute the τ j -s) the first step of the iteration would be the following: formula, where the numerator is just the residual column discrepancy remaining after the first row-wise adjustment. Therefore represents the changes made by the first column-wise adjustment. It is easy to see from Eqs. (22) and (23) and to prove by mathematical induction that in all further iteration steps λ and τ also represent the percentage row-wise and column-wise residual adjustment requirements respectively (remaining after the previous adjustments or in other words applying the (28) and (29) formulas of the 'first' iteration but after 'reinitializing' the a i,j , c i,j , r i,j , g i and h j parameters).
By reconsidering the meaning of Eqs. (18), (22) and (23) in the light of the just analysed iteration algorithm we can say that in the n-th iteration for each pair of (i,j) indices the percentage change in the corresponding matrix element (z i,j ) is the sum of the percentage change in the corresponding row-and column-totals still required after the first n − 1 iterations, minus the weighted average of these required row-total changes weighted by the reinitialized c i,j shares.

The iteration algorithm of the pure INSD-model
In the early '90s Révész (2001) developed and used an algorithm (called at that time perhaps somewhat misleadingly 'additive-RAS') instead of the RAS in the case of zero (or close to zero) known (target) margins or negative reference matrix elements. In the first step this additive-correction algorithm for each row distributes the difference of the target row total and the corresponding row-total of the reference matrix proportionately to their row-wise absolute value share (as matrix R was defined above) according to the formula, where g i (1) g i . Then a similar adjustment has to be done column-wise according to the Theorem 1 The result of the additive-correction algorithm is identical to that of the suggested iterative solution algorithm of the INSD-model.
Proof Considering (28), (29) and the definition of their categories, it is obvious that the first step of the INSD model's iteration algorithm suggested by Huang et al. (2008) is absolutely the same as that of the additive-correction algorithm.
In general, the n-th iteration (i.e. which contains the n-th row-wise and n-th columnwise adjustment) is where h j r) . Based on this the total change in the individual elements, caused by the first n iteration (d i,j (n) If the process converges then obviously its d i,j ( ) lim n→∞ d (n) i, j limit value can be computed as j . Since the d i,j ( ) elements of the 'final' matrix should satisfy the row-total and column-total requirements (otherwise the adjustment process would continue by distributing the remaining discrepancy), summing the equations of (36) by j we get the following: Similarly summing the equations of (36) by i we get the conditions for the so far unknown g i ( ) and h j ( ) values. Equations (37) and (38) can be described in matrix algebraic notations as respectively, where g ( ) and h ( ) mean the column vectors containing the elements of g i ( ) and h j ( ) respectively. Equations (39) and (40) can be combined in the q S S Tŵ system of inhomogeneous linear equations. Comparing this with (26) we can see that both the coefficient matrices and the right-hand-side constant vectors are the same as their counterpart in (26) and (41). Therefore, the solutions of the (26) and (41) set of linear equations are the same too. This means that if λ, τ are the solution of (26), then those g ( ) and h ( ) vectors which satisfy the equationsq −1 g ( ) λ andŵ −1 h ( ) τ, i.e. which can be computed as are the solutions of (41). By substituting (42) and (43) into (36) we obtain the formula for the resulting total changes (in the individual matrix elements) of the additive-correction algorithm. This is just the same as (21), the solution of the INSDmodel.

Illustration of the additive correction iteration algorithm
During its more than 25-years use for various matrix adjustment problems of the calibration of the multisectoral models the additive-correction method mostly converged fast and the resulting matrix usually fit well to the reference matrix. To illustrate this, we present not one of our exercises, usually made with large matrices, but with the numerical examples of Huang et al (2008) and Lemelin (2009) instead.
The numerical test confirmed that the additive-correction algorithm produces the same result as what Huang et al. (2008) published as optimal solution of the INSDmodel, which they had found to be the estimation method with the best fit in terms of the AIL (average information loss) measure (which was computed to be 11.28).
Although in the case of sign-flips we know little more of the mathematical characteristics of the additive-correction and INSD algorithms than what is said in Huang et al. (2008) the additive-correction algorithm yielded the following quite reasonable estimates (see in Table 3) for the (somewhat extreme) numerical example of Table 1.
To evaluate the results, we have to bear in mind that the INSD-model (and hence its additive-correction algorithm) is not (as all least squares based models are not) signpreserving (see the entries where sign changes occurred in bold italics). Furthermore, now the INSD-model produces some seemingly weird results: the sign of all elements of the 2 nd row have changed! At the first glance it seems to be quite odd, given that its original row total was already zero, therefore no row-wise adjustments (not to say sign-flips) were required. However, after more careful inspection one can realize that each column where in the 2 nd row there was a negative element the column totals had to increase tremendously, and where there were positive elements the column totals had to decrease tremendously even change sign. 'To add insult to injury' 3 column totals were expected to change sign! Apart from these cases one can see that almost all elements have changed in the right direction (i.e. to eliminate the discrepancy between the target and actual margins) and the magnitudes of the individual (cell) changes are also reasonable.
Comparing our results with those of Lemelin (see Table 2 above or in Table 8 in Lemelin 2009) one can see that the different methods reacted to this controversial situation differently: The INSD-model tended to believe that the required sign-flips of all but one column totals meant that sign-flips are also welcome in those columns. On the other hand, the K-L method fully exploited the fact that in each row-and column of the initial matrix there was at least one negative element and at least one positive element. Therefore, it duly made responsible the negative elements for absorbing most of the required decreases and tended to allocate the required increases to the positive elements.
To compare the fit of the above estimates of the INSD-model and the K-L method of Lemelin the MAD (mean average deviation) statistics were computed. Although in this respect the INSD-model performed somewhat/significantly better (MAD INSD 4.583, MAD K-L 5.357), this could be expected from the facts that the MAD is similar to the minimand of the INSD-model and the K-L measure is sign-preserving (which amounts to additional, although implicit constraints). Therefore, it only shows that the two methods produce such matrix estimates, the MAD indicator of which is somewhere in the normal range. The MARD (mean average relative deviation) statistics were computed. As one could expect, the K-L measure based model performed significantly better than the INSD-model (MARD INSD 1.17, MARD K-L 0.47). Although it would be tempting to compare the results of the two models by some information loss measure too, as noted earlier, this is not applicable for such models which produce sign-flips.
Taking everything into account (also realizing the weird nature of the figures of the example) I think the true issue could be to investigate that in which circumstances one method is more usable/reliable and in which circumstances the other (or a third) one (Table 3).

Testing the INSD-model on a practical numerical example
Finally, we present the performance of the additive-correction algorithm (and of the INSD-model) in the case of an almost everyday problem. The problem is to estimate the various incomes and expenditures of given social groups by having a reference matrix (e.g. from household budget surveys) and knowing the corresponding totals of the whole household sector (as found in the national accounts). If we account the savings of the households as expenditure and account the incomes as positive while the expenditures as negative variables, the problem can be illustrated by Table 4.
The results of the additive-correction is shown in Table 5.
As one can see that although the rows and columns had to be adjusted quite extremely, only 2 elements of the estimated matrix show sign-flips. Both belong to the column of savings (in lines 'Group#1 and 'Group#2'), and just this is the category where such sign-changes one may expect. It is also quite reasonable that the required upward adjustment of the total transfers took place almost exlusively in line Group#3, where the row sum had to be increased as well. Clearly, the above example deserves further analysis and comparison with the results of alternative methods. Due to size limits here we may say only that the INSD-model and its additive-correction algorithm is likely to perform even better in less extreme circumstances, characteristic of real-life problems.

Conclusion
The two-directional matrix adjustment methods can be applied in a growing number of areas, due to the precise formulation of the problem, the explored mathematical properties of the proposed methods, the development of computing (more efficient solving software), the accumulated international experience and the improved statistical data (which make it possible to produce a better reference matrix). Nevertheless, the most important conditions for finding the most suitable method and its successful application are the deep knowledge about the economic phenomena under investigation and about the statistical methodology behind the construction of the related reference matrix.
Obviously, many mathematical properties and relationships of these matrix adjustment models must be clarified yet. In the present article I focused on the mathematical characteristics of the matrix adjustment problem when some of the elements or margins are negative. I found the additive correction algorithm of the INSD-model to be identical to the iteration procedure suggested by Huang et al. (2008) in the special, not sign-preserving case. It was also demonstrated that this simplified algorithm produces a solution which fits well to the reference matrix and in which the elements of the matrix are adjusted in the expected direction.
Although powerful modern computers can solve such matrix adjustment problems formulated as nonlinear mathematical programming (optimization) problems, in such solutions it is not transparent how the results are related to the parameters of the model. This makes the further analysis of the results and the further development of the model harder. As opposed to this, using transparent iteration algorithms may reveal the role of certain parameters and constraints more clearly, especially by studying the results of the individual iteration steps more thoroughfully.
In general, further research is needed to clarify the mathematical properties of its iteration algorithm and related algorithms too. For example, the modification of the INSD model's above presented iteration algorithm so that the row-and columndiscrepancies be dissipated not pro-rata of the absolute value of the given elements of the initial matrix, but pro-rata of the absolute value of the given elements of the k-th iteration of the estimated matrix, is also worth investigating. The algorithm can be extended also to the estimate of 3 or more dimensional arrays in a similar way as it is done with the RAS and GRAS algorithm (Holý and Šafr 2020; Valderas-Jaramillo and Rueda-Cantuche 2021).
In any case, such matrix adjustment methods can be applied not only in isolation, but also sequentially and built into more complex mathematical programming problems.
Funding Open access funding provided by Corvinus University of Budapest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/ by/4.0/.