A new price index for multi-period and multilateral comparisons

Within the stochastic approach, this paper establishes a closed-form solution to the price index problem for an arbitrary number of periods or countries. The index’s reference basket merges the intersections of all couples of baskets in all periods/countries and provides an effective commodity coverage. Under spherical regression errors, the index satisfies the Geary–Khamis equation system and, as such, offers a general and compact representation of the latter as well as the inferential framework as a dowry. Furthermore, by relaxing sphericalness in favor of a more realistic assumption of commodity-dependent variances, a broader result is achieved. The solution to the price index problem thus obtained encompasses the Geary–Khamis formulation and sows the seeds to further advances.


Introduction
The paper works out a multi-period/multilateral price index to effectively compare sets of commodities over time and/or across countries.This is done by using a regression framework and a reference basket based on the set of all commodities that appear at least in two periods/countries.A linear model in the deflator indexes and reference prices is specified, and estimation is performed by the method-of-averages argument.The resulting price index estimator follows as a by-product and enjoys fitting properties in between least squares and least absolute values.The price index constructed in this manner belongs to the family of stochastic indexes, which can be traced back to the works of Jevons (1863Jevons ( , 1869) ) and Edgeworth (1887Edgeworth ( , 1925)).Its computation requires the knowledge of the quantities and values of the commodities included in the basket.Neither the model's specification nor the estimation method used explicitly calls for prices.This aspect becomes particularly useful when the prices of some commodities in specific periods/countries are not known.In the multi-period case, this can occur when some commodities enter/exit the basket due to a shift in consumer attitude, while in a multilateral perspective, it could happen when a commodity is not present or ceases to be present on the market in a given country.The reference basket of the index at hand is the result of the union of the intersections in pairs of the specific baskets for each period/country.The absence of a commodity in a given period/country simply means that both its quantity and value are set equal to zero in that period/country.Thus, just like hedonic (Pakes 2003;Brachinger et al. 2018), GESKS (Balk 2012) and country/time-product-dummy (CPD/TPD) approaches with an incomplete price tableau (Rao and Hajargasht 2016;Weinand 2021), the index does not drop items which are not present in all countries/periods.A commodity contributes to the index's construction provided it occurs with non-null quantities in at least two periods/countries.The reference basket is, therefore, more inclusive and representative than baskets commonly referred to in the extant literature.Consequently, the index proves effective in a multi-period and/or multilateral framework and compares favorably to the Rao and Hajargasht (2016), GEKS, Ivancic et al. (2011) indexes.
Under sphericalness of errors in the regression model underlying the index estimator, the latter tallies with the Geary-Khamis (GK) index in a temporal setting and provides both a general closed-form representation of the latter and an inferential statistical apparatus as a dowry.
Moreover, the paper proposes an additional index under the assumption of commodity-dependent variances of the regression errors.This provides a more extended solution to the price index problem, which encompasses the Geary-Khamis Index as a special case and paves the way to further generalizations.In addition, the so-called reference prices, namely the prices expected to be paid for commodities in the base period/country, are easily obtainable as a spin-off of the regression estimation and can be conveniently used to evaluate the prices of those commodities that, for whatever reason, are not observable in a given period/country.
The paper is organized as follows.Section 2 formally states the problem of finding a multi-period/multilateral price index starting from the quantities and values of the commodities in each period/country in the basket/index.The regression model uses deflators and reference prices as parameters to be estimated.Section 3 shows how the deflators are estimated using the method of averages and how the estimator of the price indexes is obtained from the deflator estimates as a by-product.In Sect.4, the hypotheses on the errors of the parent regression model are relaxed allowing the variance to possibly be commodity dependent.In Sect.5, the novel 1 3 A new price index for multi-period and multilateral comparisons index formula and the TPD index are compared, in the case of a complete and an incomplete price tableau, using simulated data.Here we show the effectiveness of our method of estimating missing prices using reference prices in comparison with the standard techniques based on the imputation of missing values.Some conclusive remarks are made in Sect.6.An appendix provides the proofs of the statements in Sect.3.

Formulation of the price index problem
In order to set up the regression framework needed to work out the index, let us consider the t ] and P = [p n,t ] whose entries are values, quantities and prices, respectively, of a basket of N goods in T periods (or countries).Such a basket is the union of the intersections in pairs of the baskets in the T periods (or countries), and we assume that it covers all (and only those) commodities which occur with non-null quantities in at least two periods (or countries).The matrices V , Q and P satisfy the equality where * is the Hadamard-product symbol.The price index problem can be read as the problem of approximating the matrix P by the outer product ′ of a vector of reference prices, > 0 N , and a vector of price indexes, > 0 T , that is, So, the equality in Eq. (1) can be reformulated as follows where D a denotes a diagonal matrix whose diagonal entries are the elements of the vector a and the matrix H accounts for discrepancies1 .Post-multiplying Eq. ( 3) by D = D −1 , yields where E = HD is a matrix of disturbance terms.Henceforth, in the wake of Theil (1960), a sphericalness (i.e., constant variance and incorrelation) assumption will be made for the error components in (4), namely for the entries of the matrix E .The issue of possibly relaxing this assumption by dropping the hypothesis of constant variance across commodity errors will be addressed in Sect. 4. The vector , whose elements are the reciprocals of the elements of the price-index vector , is the deflator/exchange rate vector.Taking t = 1 as the base period (country), namely 1 = 1 = 1 , and partitioning V , Q , E , and D as follows we pass from the matrix Eq. (4) to the system The latter can be written in staked form as follows by setting Here ⊗ is the Kronecker product symbol, vec is the staking operator, and R j is the following matrix (Faliva 1996) where e i denotes the ith elementary vector 2 .The model in Eq. ( 7) together with a sphericalness hypothesis for the error terms is a classical linear regression model.The estimation of the parameters will be accomplished in the next section by using the Method of Averages, in short MA, (Kveětoň 1987).The estimator so obtained turns out to be an instrumental-variable (IV) estimator (Goldberger 1964) with a binary instrument matrix and enjoys the desirable properties of both MA and IV inferential procedures. (5) Use has been made of the equalities A new price index for multi-period and multilateral comparisons

The solution of the index estimation problem and its meaning
In the following, the MA is applied to estimate the price-indexes via the intermediate estimation of the deflator indexes.To this end, let us look at as an over-identified system of linear equations and solve the derived system where L is a binary matrix of the same dimensions as X which satisfies the rank condition If the binary matrix , associated with X satisfies the rank condition ( 12), then this matrix provides a convenient choice for L .This leads to the linear system whose solution gives the intended index estimator.Actually, the solution (14) occurs to be an instrumental variables (IV) estimator (Goldberger 1964), with the columns of X b acting as instruments.Going back to the linear model in Eq. ( 7) and assuming sphericalness for the error terms, i.e., the dispersion matrix of the estimator ( 14) is given by and the error variance 2 is estimated by The estimator of the price index vector follows as a by-product of the estimator in Eq. ( 14), and the statistical properties of the former can be derived from those of the latter, accordingly.The estimator in Eq. ( 14) crucially rests on the following Lemma 1 The rank condition holds true for X in Eq. (8).( 10) On this premise, we can establish the following Theorem 1 The estimator φ = [I T−1 , 0] β of the deflator vector is given by where D w and D q are diagonal matrices with the elements of the vectors as diagonal entries, respectively; u N and u T are vectors of 1's with N and T compo- nents, respectively.An estimator of the variance-covariance matrix of the estimator is where is an estimator of the error variance.

Proof See Appendix. ◻
In what follows, we will refer to ̂ t as the reciprocal of the estimator of the parent deflator vector ̂ t−1 .In this connection, we establish the following Corollary 1 The estimator of the price index in period (or country) t, 2 ≤ t ≤ T , is given by where e t−1 is the t − 1 th elementary vector of T − 1 components and ̂ t−1 is the t − 1 -th component of the estimator (19).The estimated variance of ̂ t is approximated by A new price index for multi-period and multilateral comparisons where ̂ ( ̂ ) is the matrix (22).

Proof See Appendix. ◻
It can be shown that under sphericalness the analysis of the residuals of the regression model in Eq. ( 7) associated with the estimator in Eq. ( 14) leads to the systems of equations which determine the Geary-Khamis (GK) index.Thus, the estimator is a closed-form expression of the GK index.Although the GK index has been widely investigated (see, e.g., Diewert andFox (2022, 2017); Balk (2012); Heston and Lipsey (2007)), a closed-form formula of the index for the general case of an arbitrary number of periods and/or countries is lacking up to now.The index formula devised in this paper provides the intended result within a regression model framework with the inherent inferential statistical toolkit as a dowry.In order to prove that ̂ t represents a closedform expression of the GK index, check that holds true for where Simple computations, bearing in mind Eq. ( 50) and (61) in Appendix, show that The latter, together with Eq. ( 26), leads to the pair of equation systems that can be rewritten as Solving Eqs. ( 33) and (34) for ̂ t and ̂ i yield Under 1 = 1 , Eqs. ( 35) and ( 36) read as the equation systems of the GK index in the temporal setting and the index can be obtained, accordingly (as noticed by an anonymous referee we are indebted to).

Dropping the assumption of constant-variance errors
In the previous section, the estimation procedure of the deflator vector (and eventually of the price index ) has been performed under the assumption of error sphericalness which embodies both uncorrelation and constant-variance of disturbances.Leaving apart the issue of dependence, in particular correlation, that we exclude from our analysis, let us investigate the assumption of constant variance.A hypothesis of constant variance over time for errors is tenable by virtue of the argument that the model specification is the outcome of a deflating transformation via D = D −1 .No a-priori justification can be advanced for a constant-variance hypothesis for different commodities, if not computational convenience.So it is worth considering the issue more deeply.The analysis cannot but start from the residuals corresponding to the estimator β , that is As a simple computation shows the (sub)vector of the residuals referable to the nth commodity over the time span 1 ≤ t ≤ T is given by ( 33) 1 3 A new price index for multi-period and multilateral comparisons where J n is the selection matrix with e j denoting the jth elementary vector of TN components.Accordingly, an esti- mator of the variance 2 n of the T errors referable to the nth commodity is given by with ̂ 2 given by Eq. ( 17) and ̂ 2 n ensuing as a by-product.It follows that the former assumption of a scalar dispersion matrix for the errors no longer holds and it must be replaced by the following specification where D ς is the N × N diagonal matrix whose diagonal entries are the squares of the scalars ς1 , ς2 , … , ςN .Under Eq. ( 41), the model is no longer a classical linear model.Nevertheless, it can easily be brought back to a classical model by premultiplying both sides of Eq. ( 42) by the matrix (I T ⊗ D −1 ς ) , which yields the specification where with ̃ enjoying the sphericalness property Noting that the vector can be newly estimated via the moving-average approach, with X b = Xb playing the role of the instrumental variable matrix.Eventually, we get the estimator This shows that the sphericalness assumption can be relaxed in the case of interest.This paves the way to further extensions, if required.

A simulation based on log-normal random draws
In this section we illustrate the performance of the index developed in Sect.3, called MA index hereafter, through three simulated examples.The scope of this analysis is to investigate, in a comparative manner, the capability of both the MA and the time dummy product (TPD) (de Haan et al. 2020) index to reproduce the "true" index , in a multi-period perspective.To this aim, let us assume that quantities, Q , reference prices, , and price indexes, , of four commodities over six periods are specified as follows Then, with these data at hand, the values, V , have been computed as in (3) where, the random terms, H , without lack of generality, have been generated from a stand- ard log-Normal distribution The prices, P , needed to compute the TPD index, have been worked out as ratios between values and quantities: 5.00 7.00 7.00 8.00 10.00 12.00 15.00 20.00 21.00 24.00 25.00 27.00 25.00 22.00 20.00 23.00 23.00 25.00 5.00 6.00 6.00 8.00 10.00 15.00 , … , q 6 (4,1) . � 1 3 A new price index for multi-period and multilateral comparisons In this simulation, the matrices Q and V have been used to compute both the MA and the TPD indexes, in a multi-period perspective.In this regard, we have considered three different cases that cover three empirical scenario: -Case.1 Complete price tableau, implying a reference basket including a complete dataset for the four commodities; -Case.2Incomplete price tableau, assuming missing the second and fourth commodity in the first and second period, respectively, (that is q 41 = v 41 = 0 and q 22 = v 22 = 0 ), with a "standard" reference basket that includes only the first and the third commodities; -Case.3Incomplete price tableau assuming missing the second and fourth commodity in the first and second period, respectively, (that is q 41 = v 41 = 0 and q 22 = v 22 = 0 ), with the MA reference basket that includes commodities present in at least two periods, namely all the four commodities.
The outcome of the Breusch-Pagan test has led to rule out the presence of heteroschedasticity in all these three scenarios.In what follows the sum of the squares of the differences between the estimated MA and TPD indexes, ̂ , and the "real" index, have been worked out for the three said cases: -Case.1 Complete price tableau: 0.0032 (MA) and 0.0066 (TPD); -Case.2Incomplete price tableau ("standard" basket): 0.004 (MA) and 0.010 (TPD); -Case.3Incomplete price tableau ("novel" basket): 0.001 (MA) and 0.003 (TPD).
It is worth noting that the MA index provides always the best fit to the index compared to the TPD one.As expected the index coincides, given the absence of heteroschedasticity across commodities, the MA index turns out to tally with the GK one.In all cases, the MA estimates turn out to be more efficient (see Fig. 1), as they have lower variances and, consequently, they are always included in a 2 confidence band of the TPD index, as shown in Fig. 1.Looking at Fig. 2, we see that the values of the MA index provide the best fit to the real price index , avoiding the TPD overestimation issue present in all cases and, in particular, when there are missing prices.
These examples also highlight the role played by the reference prices, p , which are the prices that consumers are expected to pay for the commodities in a given period/ country.Reference prices prove useful in obtaining estimates of the prices of those commodities which, being missing in the basket, can not be determined.Indeed, the price of a commodity, say i, missing in a period, say t, is undetectable, but can be determined as π ′ i λt , where πi and λt are the estimate of the reference price of the commodity i and the MA index at time t, respectively.This strategy has been used to estimate the 3.26 3.75 3.81 3.93 4.00 4.41 2.52 2.66 2.74 2.88 2.95 3.14 1.74 1.76 1.87 2.00 2.05 2.10 2.43 2.96 3.06 3.32 3.18 3.59 . � prices of the second and fourth commodity in Case.3.According to (2), the prices p 4,1 and p 2,2 can be estimated as follows A new price index for multi-period and multilateral comparisons Note that πi represents the price of the i th commodity in the base period/coun- try, here assumed to be t = 1 .According to (48), the price estimates pi,t at times t = 2, 3, … , T are obtained by updating πi by means of the values of the index λt at these periods (or for the countries t = 2, 3, … , T ).To assess the goodness of the pi,t estimates, the sum of the squares between observed, p i , and estimated prices, pi , have been computed for all the commodities in the three cases under study.These are and confirm the satisfactory performance of the MA index in reproducing missing prices by using reference ones.Clearly, this result hinges on the ability of this approach to provide good estimates of the reference prices.In this regard, Fig. 3 that compares the estimates of the MA reference prices with the "real" ones provides evidence of the goodness of the estimated reference prices for all commodities, also in the presence of missing prices.Indeed, all the points are close to the bisector of each panel.Unfortunately, given that the TPD approach does not provide reference prices as spin off, a comparison between the MA and the TPD indexes under this latter aspect, that is the "quality" of missing prices constructed by using reference prices, is not possible.

Concluding remarks
The paper provides the solution to the multi-period and/or multilateral price index in closed form, under proper error assumptions, taking a regression model specification as the frame of reference and the method of averages as the estimation approach.Two specifications are assumed in turn: sphericalness and commodity-dependent variances of the error terms.The former leads to an expression in compact form of the price indexes for an arbitrary number of periods and/or countries.The regression inferential apparatus applies accordingly.The price-index expression thus obtained proves to tally with the already known Geary-Khamis (GK) index, which is eventually endowed of the inferential heritage of the former.The second error specification drops the homoskedastic assumption in favour of a commodity-dependent hypothesis for the error variances, which leads to a new and more general price-index formula with a significance that extends beyond the GK case and opens up pathways for further research.

A.1 Proof of Lemma 1
Let X be defined as in Eq. ( 8) and X b the binary matrix associated with X and set  Reference prices without missing Fig. 3 MA estimates of the reference prices compared with the "real" ones for Case.1, Case.2 and Case.3, respectively.The dotted line represents the bisector with w > 0 T−1 and q > 0 N defined in Eq. ( 21).Here use has been made of the fol- lowing formula (see Faliva 1996) which holds for any couples of matrices A and B of order N × M .As (see Guttman  1946)   and it follows that the matrix (X b ) � X is of full rank if and only if w is non-singular, which occurs if the maximum eigenvalue of the non-negative matrix = Q � 2 D −1 q V 2 D −1 w is lower than one.According to Perron-Frobenius theorem (Lancaster 1968), this is the case if the following holds with some equality sign.Setting simple computations show that (50) A new price index for multi-period and multilateral comparisons and Eq. ( 54) follows accordingly.Thus, I − is non-singular and as requested.◻

A.2 Proof of Theorem 1
From Eqs. ( 8) and ( 14), it follows that with (X b � X) defined as in Eq. ( 50).A well known partitioned-inversion formula (Faliva and Zoia 2009) gives where Besides, the following holds From Eqs. ( 59), ( 60) and ( 61), the closed-form of the MA deflator is easily obtained, that is, The more explicit representation in Eq. ( 20) follows from the latter by noting that (57) and which proves (20).The estimator in Eq. ( 22) of the dispersion matrix of the MA estimator follows from the equality as ̂ (v 1 ) = ̂ ( 1 ) = ̂ 2 I N , according to Eq. ( 15).The estimator in Eq. ( 23) of the variance 2 follows from Eq. ( 17) by working out the sum of the squared residuals bearing in mind Eqs. ( 8), ( 14), (59), and noting that ◻

A.3 Proof of Corollary 1
The estimator of the price index at time t is the reciprocal of the estimator of the (t − 1) th component of ̂ , namely ̂ t−1 , given in (20), that is The variance of the estimator of the price index can be approximated following Benaroya et al. ( 2005) ◻ (63) (67) X � y = 0 D q 1 v 1 .
1 3 A new price index for multi-period and multilateral comparisons Funding Open access funding provided by Università Cattolica del Sacro Cuore within the CRUI-CARE Agreement.No funding was received for conducting this study.

Fig. 1
Fig. 1 Comparison of the MA and the TPD indexes (with 2 confidence bands, given (22) for the MA index) for Case.1, Case2.and Case.3, respectively

Fig. 2
Fig. 2 Comparison of the MA and the TPD indexes (without confidence bands) for Case.1, Case.2 and Case.3, respectively