Improving quality of the scanner CPI: proposition of new multilateral methods

Scanner data can be obtained from a wide variety of retailers (supermarkets, home electronics, Internet shops, etc.) and provide information at the level of the barcode, i.e. the Global Trade Item Number or its European version: European Article Number. One of advantages of using scanner data in the Consumer Price Index measurement is the fact that they contain complete transaction information, i.e. prices and quantities for every sold item. One of new challenges connected with scanner data is the choice of the index formula which should be able to reduce the chain drift bias and the substitution bias. Multilateral index methods seem to be the best choice in the case of dynamic scanner data sets. These indices work on a whole time window and are transitive, which is a key property in eliminating the chain drift effect. Following the so-called identity test, however, one may expect that even when only prices return to their original values, the index becomes one. Unfortunately, the commonly used multilateral indices (GEKS, CCDI, GK, TPD, TDH) do not meet the identity test. The paper discusses the proposal of two multilateral indices, the idea of which resembles the GEKS index, but which meet the identity test and most of other tests. In an empirical study, these indices are compared, inter alia, with the SPQ index, which is relatively new and also meets the identity test. Analytical considerations as well as empirical study confirm the high usefulness of the proposed indices.


Introduction
Scanner data have numerous advantages compared to traditional survey CPI data collection because such data sets are much bigger than traditional ones and they contain complete transaction information, i.e. information about prices and quantities at the retailer's code or the GTIN/EAN/SKU barcode level. Scanner data sets may provide some additional information about products (including such attributes as size, colour, package quantity, etc.). These attributes are useful for aggregating items into homogeneous groups. In addition, scanner data are much cheaper to obtain than survey data traditionally used for CPI calculation. As pointed out in Eurostat's Practical Guide for Processing Supermarket Scanner Data from 2017: "In the traditional price collection, price collectors have to trust intuition and common sense and it may happen that prices are collected as long as the item is available even though it is no longer representative. In scanner data the representativeness is guaranteed" (Eurostat 2017).
There are many challenges involved in scanner data processing, e.g. automatic product classifying into COICOP groups, product matching, data filtering, data aggregating, etc. (Białek and Berȩsewicz 2021), since a typical supermarket uses 10,000-25,000 item codes. One of main challenges is the proper choice of the price index formula, which should take into account seasonal products together with new and disappearing goods. Scanner data contain expenditure information at the item level, which makes it possible to use expenditure shares of items as weights for calculating price indices at the lowest (elementary) level of data aggregation. Most statistical agencies use bilateral index numbers in the CPI measurement, i.e. they use indices which compare prices and quantities of a group of commodities from the current period with the corresponding prices and quantities from a base (fixed) period. A multilateral index is compiled over a given time window composed of T + 1 successive months (typically T = 12 ). Multilateral price indices take as input all prices and quantities of the previously defined individual products, which are available in a given time window, i.e. in at least two of its periods. These methods are a very good choice in the case of dynamic scanner data, where we observe a large rotation of products and strong seasonality (Chessa et al. 2017). Moreover, multilateral indices are transitive, which means in practice that the calculation of the price dynamics for any two moments in the time window does not depend on the choice of the base period. By definition, transitivity eliminates the chain drift problem which may occur while using scanner data. The chain drift can be formalized in terms of the violation of the multi period identity test. According to this test, one can expect that when all prices and quantities in a current period revert back to their values from the base period, then the index should indicate no price change and it equals one. Thus, multilateral indices are free from the chain drift within a given estimation time window [0, T]. Although Ivancic et al. (2011) have suggested that the use of multilateral indices in the scanner data case can solve the chain drift problem, most statistical agencies using scanner data still make use of the monthly chained Jevons index (Chessa et al. 2017).
The Jevons index Jevons (1865) is an unweighted bilateral formula and it is used at the elementary aggregation level in the traditional data collection. As the scanner data provide information on consumption, it seems more appropriate to use weighted indices. Unfortunately, bilateral weighted formulas do not take into account all information from the time window, while the frequently chained weighted indices (even superlative) may generate chain drift bias (Chessa 2015) and therefore do not reflect a reasonable price change over longer time intervals. For this reason, many countries have experimented with multilateral indices or even implemented them for the regular production of price indices (Krsinich 2014;Inklaar and Diewert 2016;Chessa et al. 2017;Chessa 2019;Diewert and Fox 2018;de Haan et al. 2021).
Following the so-called identity test (International Labour Office 2004;von der Lippe 2007), however, one may expect that even when only prices return to their original values and quantities do not, the index becomes one. This test is quite restrictive for multilateral indices and causes some controversy among price statisticians. Nevertheless, it is mentioned among the axioms regarding multilateral indices both in the publications of the European Commission and in journals from the area of official statistics (Zhang et al. 2019). Unfortunately, the commonly used multilateral indices (GEKS, CCDI, GK, TPD, TDH) do not meet the identity test. The main aim of the paper is to present and discuss the proposition of two multilateral indices, the idea of which resembles the GEKS index, but which meet the identity test and most of other axioms. The proposed indices are compared with the multilateral SPQ index method, which is relatively new and also meets the identity test. Analytical considerations as well as empirical and simulation studies confirm the high usefulness of the proposed indices. We also compare how time-consuming all considered index methods are.
The paper is organised as follows: Sect. 2 covers the most important multilateral indices. Section 3 presents the main criteria that can assist in selecting a multilateral index. Section 4 proposes a general class of multilateral indices with two specific cases being discussed in Sect. 5. Section 6 is an empirical study, which confirms the high usefulness of the proposed multilateral indices. Section 7 lists the most important conclusions of the research carried out.

Multilateral price index methods
Multilateral index methods originate in comparisons of price levels across countries or regions. Commonly known methods include the GEKS method (Gini 1931;Eltetö and Köves 1964), the Geary-Khamis method (Geary 1958;Khamis 1972), the CCDI method (Caves et al. 1982) or the Time Product Dummy Methods (de Haan and Krsinich 2018). Before we present the multilateral price indices which were used in the paper for comparisons, let us denote sets of homogeneous products belonging to the same product group in months 0 and t by G 0 and G t respectively, and let G 0,t denote a set of matched products in both moments 0 and t. Although, in general, the item universe may be very dynamic in the scanner data case, we assume that there exits at least one product being available during the whole time interval [0, T]. Let p i and q i denote the price and quantity of the i-th product at time and N 0,t = cardG 0,t .

The quality adjusted unit value index and the Geary-Khamis (GK) method
The term "Quality adjusted unit value method" (QU method) was introduced by Chessa (Chessa (2016)). The QU method is a family of unit value based index methods with the above-mentioned Geary-Khamis (GK) method as a special case. According to the QU method, the price index P 0,t QU , which compares period t with base period 0, is defined as follows: where the numerator in (1) is the measure of the turnover (expenditure) change between the two months and the denominator in (1) is a weighted quantity index. Prices of products, p 0 i and p t i are converted into "quality adjusted prices" p 0 i ∕v i and p t i ∕v i . Different choices of factors v i lead to different price index formulas. In the GK method, the weights v i are defined as follows: where and [0, T] is the entire time interval of product observations (typically T = 12 , see Diewert and Fox 2018). Note that formulas (1), (2) and (3) lead to a set of equations which should be solved simultaneously. The above-mentioned solution can be found iteratively (Chessa 2016) or as a solution to an eigenvalue problem (Diewert 1999).

The GEKS method
Let us consider a time interval [0, T] of observations of prices and quantities that will be used for constructing the GEKS index. The GEKS price index between months 0 and t is an unweighted geometric mean of T + 1 ratios of bilateral price indices P ,t and P ,0 , which are based on the same price index formula. The bilateral price index formula should satisfy the time reversal test, i.e. it should satisfy the condition P a,b ⋅ P b,a = 1 . Typically, the GEKS method uses the superlative Fisher (1922) price index, resulting in the following formula: Please note that the suggestion from literaturede Haan and van der Grient (2011) is that the Törnqvist price index formula (Törnqvist 1936) could be used instead of the Fisher price index in the Gini methodology. Following Diewert and Fox (2018) the multilateral price comparison method involving the GEKS method based on the Törnqvist price index is called the CCDI method.

The Time Product Dummy method
The Time Product Dummy method (TPD) uses a panel regression approach to estimating price indices using all available data from a given reference period. The econometric model for log prices observed during the time interval [0, T] is as follows: where D t i is the dummy variable that has the value of 1 if the i-th product is available in period t and 0 otherwise, t are the time dummy parameters, D i,j is a dummy variable with the value of 1 if the i-th observation relates to the individual product j and 0 otherwise, j represents the item fixed effects and t i denotes the corresponding random error term. The quality adjusted price of a set of products G t in month t can be written as: Following Diewert (2004), we assume that model (5) is estimated by the Weighted Least Squares (WLS) method with expenditure shares used as weights. Where the price is adjusted using the item fixed effects, i.e. v i = exp( i ) , the TPD index can be expressed as follows:
For the considered period define P 0,t SPQ = P F (p r * , p t , q r * , q t )P 0,r * SPQ .

Criteria for selecting multilateral indices
Multilateral indices seem to be the best choice for dynamic scanner data. Nevertheless, the statistical office should be aware of the fact that the use of each of the discussed multilateral indices has both advantages and disadvantages. In particular, the choice of the final formula to implement may depend on the set of criteria we will use. Theoretical and empirical points of view should be taken into account when choosing a final price index formula. The multilateral indices proposed in the paper are assessed through the prism of an axiomatic approach. For this reason, Sect. 3.1 focuses on the basic axioms dedicated to multilateral indices. Section 3.2, however, also discusses other criteria taken into account when selecting the price index, with two additional criteria (the computation time and a comparison to the theoretical expected relative price value) being used in the work (see Sect. 3.2).

Axiomatic approach
According to the axiomatic approach, desirable index properties (the so called "tests") are defined that a multilateral index may, or may not satisfy. The list of tests for multilateral indices can be found in the guide provided by the Australian Bureau of Statistics (2016) (see the chapter entitled: "CRITERIA FOR ASSESSING MULTILATERAL METH-ODS"). Interesting considerations concerning tests for price indices in the case of dynamic scanner data sets can be found in Zhang et al. (2019), where the authors-on the basis of the COLI (Cost of Living Index) and COGI (Cost of Goods Index) concepts-focus on five main test for a dynamic item universe (identity test, fixed basket test, upper bound test, lower bound test and responsiveness test). We generally agree with the authors of the previously cited work that it is difficult to correctly and reliably define other tests for multilateral indices based on dynamic scanner data. In fact, the genesis of multilateral methods comes from international comparisons where the reference set of countries or regions is fixed and any two countries (or regions) are eligible for comparison. For example, it is difficult to imagine the transitivity axiom in the case when, for the considered time interval [0, t], the set of matched products for the base (0) and the current (t) period is empty. Nevertheless, following the guidelines from the Australian Bureau of Statistics (2016) or the paper by Diewert (2020), we consider a fairly wide set of tests for multilateral indices (see "Appendix 1") assuming that the conditions for their use are met (e.g. a set of matched products over a period of time is never empty).
Please note that the discussed multilateral index formulas (GK, GEKS, CCDI, TPD) meet most of the requirements at the same time, such as the transitivity, multi-period identity test, positivity and continuity, proportionality, homogeneity in prices, commensurability, symmetry in the treatment of time periods or symmetry in treatment of products tests. However, the discussed indexes differ in terms of the total set of tests they meet. For instance: the GEKS, CCDI and TPD indices do not satisfy the basket test, the Geary-Khamis and TPD indices do not satisfy the responsiveness test to imputed prices while the GEKS or CCDI can incorporate the imputed prices of missing products, and the homogeneity in quantities does not hold in the case of the Geary-Khamis formula. The SPQ index is the only multilateral index that satisfies the identity test, which is a stronger requirement than the lack of chain drift.
Please note in this place, that the identity test (see "Appendix 1") is quite a controversial test, because bilaterally it only takes into account the base and current periods and it is too strong requirement for most multilateral indices, e.g. for the GEKS, TPD or Geary-Khamis indices. Moreover, at first glance, the term 'identity test' can be misleading, because the literature also considers definitions and concepts such as 'multi-period identity test' or a 'weak' and 'strong' version of the identity test (Eurostat 2022, pp. 9 and 30). Nevertheless, this paper is in line with recognized and commonly accepted publications on axioms for multilateral indices (Australian Bureau of Statistics 2016; Zhang et al. 2019) and a recent report by Eurostat (2022), which put the identity test between other requirements for multilateral indices and distinguishes above-mentioned misleading terms. A careful study of the concept of these versions of the identity test leads to the conclusion that these are completely separated requirements, in addition leading to different consequences (although strong identity implies weak identity) . In particular, the multi-period identity test is a key property in eliminating chain drift, but the idea of chain drift (see Sect. 1) is rather related to the concept of a weak version of the identity test.
Please also note, that in practice it would be extremely rare to observe prices and at the same time quantities returning to their original values (except for strongly seasonal goods), and that is why a strong identity test (here and in the Eurostat report briefly called the identity test), despite its restrictiveness, is considered by many authors (see also Diewert (2020)). The cited Eurostat report also indicates that the SPQ index is an example of a multilateral method that passes the identity test (page 32), which, given the growing interest in this index, makes us treat the identity test seriously. At the recent 17th Ottawa Group Meeting (OGM) in Rome (June, 2022), in which the author of the article had the pleasure to participate, a lot of attention was paid to both the SPQ index and the identity test (e.g. Diewert 2022). As in this article, during the OGM meeting the identity test had always been understood in its strongest version. Zhang (Zhang et al. 2019) even suggested that this is one of the most important requirements for multilateral indices used in the dynamic world of scanner data. This article therefore treats the identity test as a separate, restrictive but at the same time important requirement for the final selection of a multilateral index.

Other approaches
The economic approach to the price index theory assumes that the quantities are functions of prices. This approach assumes that households optimise their purchase by maximising their utility under a budget constraint or by minimizing the cost of purchases for a given utility level. The multilateral methods involve different assumptions about the functional form of the utility function. For instance, the GEKS index is exact for a "flexible" functional form, i.e. it expresses the price differences experienced by optimising consumers without imposing restrictive assumptions about how they can substitute between products.
The main assumption of the stochastic approach is that prices or price changes can be estimated by an econometric model. Since the weighted least squares (WLS) is a wellknown statistical method, the use of the TPD is clearly understandable by users and that index is quite easy for implementation. Please note that the use of the TPD index is justified in situations where the individual products are closely related, and thus their prices can be well approximated by the regression model. From that point of view, this method is not sufficient at higher levels of data aggregation. Note also that the econometric model explaining price behavior can be extended with additional variables related to product characteristics, like in the Time Dummy Hedonic method (TDH) (de Haan and Krsinich 2018). This is particularly important if the data set includes numerous new and disappearing products. Some interesting comparison of the TPD and TDH indices can be found in de Haan et al. (2021).
Some other criteria can be taken into consideration in the multilateral method selecting. With the exception of the SPQ index, by default, the choice of a multilateral index entails the decision to choose the width of the time window and the splicing technique in order to avoid obtaining a revision of already published indices. Some empirical works suggest that the TPD and GK indices are more sensitive to these choices compared to other multilateral indices (GEKS, CCDI). On the other hand, the GEKS and CCDI indices seem to be more sensitive to the so called "dump prices" (dumping occurs when a product is sold at reduced prices before removing from the market). Białek and Berȩsewicz (2021) draw attention to the differences in the calculation time when determining price indices. In particular they suggest that compiling the Geary-Khamis (GK) and TPD indices is more time consuming than calculation of the GEKS or CCDI indices. In another approach, knowing the theoretical stochastic processes that describe the price evolution of a homogeneous product group, we can determine the expected values for a price change for such a product group in a given time period. The theoretical value of the price index determined in this way can then be compared with the multilateral index calculated on the basis of the generated artificial scanner data set.

Proposition of the general class of semi-GEKS indices
In the "classical" approach to constructing the GEKS-type indices, the bilateral price index formula, which is used in the GEKS' body, is the superlative one. In other words, although the standard GEKS method uses the Fisher indices as inputs (Chessa et al. 2017), other superlative indices are possible choices as well, e.g. the Törnqvist or Walsh indices (van Loon and Roels 2018; Diewert and Fox 2018). Moreover, in the paper by Chessa et al. (2017), we can read that "the bilateral indices should satisfy the time reversal test". The choice of the superlative indices as an input for GEKS has its justification in the economic approach, since the superlative indices are considered to be the best proxies for the Cost of Living Index (International Labour Office 2004). However, please note that the concept of multilateral indices is not based on the COLI framework and requirements for multilateral methods differ from those dedicated to bilateral ones. The time reversibility requirement, which allows the GEKS index to be transitive, enables expressing the GEKS index in a more intuitive, quotient form: where P ,s is the choosen bilateral price index formula (for s = 0, t).
Due to the above-presented remarks, in this paper we propose a general class of indices based on the idea of the GEKS method, where the base bilateral index breaks all "classical" assumptions: a) it is not superlative; b) it fails the time reversal test; c) it uses quantities only from one of two compared periods. Due to the c point, we will call it the general class of semi-GEKS indices, and it will be denoted here by GS − GEKS (the General Semi-GEKS). Our proposal assumes that the formula P ,s , which compares the current period s with the base period , can be written in the following form: where a function f G ,s (q , p , p s ) takes into account products from the G ,s set.
We have a list of minimal requirements for the function f G ,s (q , p , p s ) : R1) It must be a positive and continuous function of its arguments and f G (q , p , p ) = 1 ; R2) The proportionality in current prices and inverse proportionality in base prices must hold (homogeneity of degree +1 in current prices and homogeneity of degree -1 in base prices), i.e. we expect that f G ,s (q , mp , kp s ) = k m f G ,s (q , p , p s ) ; R3) The only possible reaction to identical quantity changes is no reaction, i.e. f G ,s (kq , p , p s ) = f G ,s (q , p , p s ) ; R4) For any diag- For two different data sets G * ,s and G * * ,s being subsets of G ,s , where one is contained in the other (e.g. G * ,s ⊂ G * * ,s ), we obtain, in general, two different function values, i.e. f G * ,s (q , p , p s ) ≠ f G * * ,s (q , p , p s ) . Please note that condi- tions R1, R2, R4 make the formula f G ,s (q , p , p s ) satisfy all axioms from the system of minimum requirements of price index by Martini (1992) (identity, commensurability, linear homogeneity).
The requirements R1-R5 are fundamental and they will be justified with respect to good axiomatic properties of the multilateral GS − GEKS method in the next part of the paper (see Theorem 1). We will also show (see Sect. 5) that the proposed general class of semi-GEKS indices is not empty and, in fact, it includes a large number of potential price index formulas. We will also consider the stronger version of the R5 condition (not obligatory), which can be written as: R6) For two different data sets G * ,s and G * * ,s such as p , p s ) . Finally, let us also take into consideration an additional requirement of monotonicity: R7) If p s i ≤ p s * i for any i− th product from G ,s , then it holds that f G ,s (q , p , p s ) ≤ f G ,s (q , p , p s * ) . As it will be shown (see Theorem 1), the requirements R6 and R7 are crucial with respect to the lower and upper bound tests.
Taking bilateral index formula (9) as an input in the GEKS body (8) we obtain the following form of the proposed general class of semi-GEKS indices: The following theorem can be proved (see "Appendix 2") Theorem 1 Under restrictions R1-R5 each GS-GEKS index (10) satisfies the transitivity, identity, multi period identity, responsiveness, continuity, positivity and normalization, commensurability, price proportionality, homogeneity in prices and homogeneity in quantitites tests. If the requirements R6 and R7 are additionally fulfilled, this index also satisfies the lower and upper bound tests.

Special cases of the class of GS-GEKS indices
This section presents propositions of two new multilateral price indices being special cases of the semi-GEKS class of indices.

Proposition based on the Laspeyres formula
Let us define the function f G ,s (q , p , p s ) introduced in Sect. 5 as follows: where the "L" subscript refers to the Laspeyres formula. Putting (11) in formula (10), we obtain It is easy to verify that the function f L G ,s (q , p , p s ) satisfies the requirements R1-R5 and R7 described in Sect. 4 (the proof is omitted). As a consequence, the proposed formula (12) realizes the thesis of Theorem 1. However, we still do not know under what circumstances the condition R6 possibly holds (if any). In the further part of the work we will use an additional normalisation that will allow the proposed formulas to satisfy the requirement R6 and consequently the upper and lower bound test. Please note that the GEKS-L index can be treated as the generalization of the Fisher price index formula ( P 0,t F ) to the multi-period case. In fact, in a static item universe G observed over the two period time interval [0, 1], we obtain

Proposition based on the qeometric Laspeyres formula
Let us define the function f G ,s (q , p , p s ) introduced in Sect. 5 as follows: where and the "GL" subscript refers to the geometric Laspeyres formula (von der Lippe 2007). Putting (14) in the formula (10), we obtain It is easy to verify that the function f GL G ,s (q , p , p s ) satisfies the requirements R1-R5 and R7 described in Sect. 4 (the proof is omitted). As a consequence, the proposed formula (12) realizes the thesis of Theorem 1. Please note that the GEKS-GL index can be treated as the generalisation of the Törnqvist (1936) price index formula ( P 0,t T ) to the multi-period case. In fact, in a static item universe G observed over the two period time interval [0, 1], we obtain

Remark
In a situation where we simply increase the range of already sold, typical products from the level G * ,s to the level G * * ,s , we can expect that the price index itself will not change significantly, i.e.
Let us consider for a moment the following, normalized version of the f L G ,s function: The f NL G ,s function satisfies the requirements R1-R5 and R7. Moreover, since N * ,s = card(G * ,s ) < N * * ,s = card(G * * ,s ) , from (18) and (19) we can expect in practice (especially for big differences in the considered product sets G * ,s and G * * ,s ) that f G * ,s (q , p , p s ) ≥ f G * * ,s (q , p , p s ) . Thus, the normalized function f NL G ,s seems to satisfy the requirement R6. The same considerations could be repeated for the normalized version of the f GL G ,s function, i.e. for the function defined as follows: Consequently, from the point of view of the lower and upper bound test, it may be interesting to consider the following normalized versions of the GEKS-L and GEKS-GL indices: and (17) In analogy to the weighted GEKS index (Melser 2018), it would be also possible to consider the following weighted versions of the GEKS-L and GEKS-GL indices: and where the weights concerning the period could be defined as follows: We will not take into further considerations formulas (21), (22), (23) and (24) in the paper, but the weighted versions (23) and (24) are implemented in our PriceIndices R package (see Białek 2021 and also Sect. 6 for more details about this package), i.e. they are available via package functions: WGEKS-L() and WGEKS-GL().

Empirical illustration
In the following empirical study we use scanner data from one retail chain in Poland, i.e. monthly data on long grain rice (subgroup of COICOP 5 group: 011111 ), ground coffee (subgroup of COICOP 5 group: 012111 ), drinking yoghurt (subgroup of COICOP 5 group: 011441 ) and white sugar (subgroup of COICOP 5 group: 011811 ) sold in over 210 outlets during the period from December 2019 to December 2020 (352705 records, which means 210 MB of data). Before calculating the price indices, the data sets were carefully prepared. First, after deleting the records with the missing data and performing the deduplication process, the products were classified first into the relevant elementary groups (COICOP level 5) and then into their subgroups (local COICOP level 6). Product classification was performed using the data_selecting() and data_classification() functions from the PriceIndices R package (Białek 2021). The first function required manual preparation of dictionaries of keywords and phrases that identified individual product groups. The second function was used for problematic, previously unclassified products and required manual preparation of learning samples based on historical data. The classification itself was based on machine learning using random trees and the XGBoost algorithm (Tianqi and Carlo 2016). Next, the product matching was carried out based on the available GTIN (Global Trade Item Number) bar codes, internal retail chain codes and product labels. To match products we used the data_matching() function from the PriceIndices package. To be more precise: products with two identical codes or one of the codes identical and an identical description were automatically matched. Products were also matched if they had identical one of the codes and the Jaro-Winkler distance (Jaro (1989)) of their descriptions was smaller than the fixed precision value: 0.02. In the last step before calculating indices, two data filters were applied to remove unrepresentative products from the database, i.e. the data_filtering() function from the cited package was used. The extreme price filter (Białek and Berȩsewicz 2021) was applied to eliminate products with more than three-fold price increase or more than double price drop from month to month. The low sale filter (van Loon and Roels 2018) was used to eliminate products with relatively low sales from the sample (almost 30% of products were removed). The results obtained for the GEKS-L, GEKS-GL, GEKS, Geary-Khamis, TPD and SPQ indices are presented in Fig. 1.   Fig. 1 Comparison of selected multilateral indices for 4 homogeneous group of food products Based on Fig. 1, we can conclude that although the values of multilateral indices do not usually differ crucially, in the conducted study, noticeable differences between the studied indices were observed in the case of white sugar and ground coffee (see the scale on the Y-axis). First, an attempt was made to explain the reasons for the differences between the indices in the case of these two data sets. In order to determine the possible determinants of the differences in index indications, for each of the four scanner data sets, the following were examined (see "Appendix 3"): a) monthly fractions of products remaining on sale since December 2019 (Fig. 3); b) monthly values of Pearson's correlation coefficient between prices and quantities (Fig. 4); c) monthly coefficients of variation of product prices (Fig. 5); d) monthly coefficients of variation of product quantities (Fig. 6). Observing the analyses a and b, a rather surprising conclusion was drawn that neither the level of price-quantity correlations and the life expectancy of the products differentiate the analyzed data sets (see Figs. 3 and 4), i.e. in all cases, we observe a weak or moderate, negative correlation between prices and quantities, and changes in the product assortment are similar to each other. And therefore these features did not contribute to the differences in the index indications, which seems to contradict the common opinion (Chessa et al. 2017) that high product churn (inflow and outflow of products) implies the differences between multilateral indices. Moreover, price volatility (measured by the coefficient of variation), which is the main cause of differences between bilateral price indices, also turned out not to differentiate the analyzed data sets (see Fig. 5), and thus it was not price volatility that determined the differences between the values of the indices. Quite unexpectedly, the volatility of the quantity of products sold seems to have a clear impact on the differences between multilateral indices. Please note that the coefficients of variation of product quantities are clearly higher for the data sets for the white sugar and ground coffee (Fig. 6). However, this thread requires further research.
As clear differences between the indices were observed in two of the four analyzed scanner data sets, the continuation of the graphical presentation of multilateral indices (Fig. 1) was also the determination of exact, albeit averaged, differences between them. For this purpose, the average absolute differences between the indices on the basis of all monthly index values were determined by using the compare_distances() function from the PriceIndices package (see Tables 2, 3, 4, 5 in "Appendix 3"). It was noted that the GEKS-L and GEKS-GL indices approximate each other and, moreover, their values are quite close to those of the GEKS index. As a rule, the values of the SPQ index are also the closest to the GEKS-L and GEKS-GL indices (in three cases, i.e. with the exception of the ground coffee data set). It seems that this observation confirms the separateness of indices that meet the identity test. The Geary-Khamis index is a good proxy for the Time Product Dummy (TPD) index, which confirms some previous results (Chessa et al. 2017;Białek and Berȩsewicz 2021), but it always seems to be the most distant from the GEKS-L index.
The SPQ, GEKS-L and GEKS-GL indices require shorter calculation times compared to other index methods. This fact is not surprising for the SPQ index, as it does not work on the traditional time window. Meanwhile, the GEKS-L and GEKS-GL indices do not take into account the quantities from the current period, and thus they save on calculation time. The longest computation time (mean time of 100 repetitions) recorded in the study was for the TPD and GK indices (Fig. 2).
In the study we also decided to check the sensitivity of the considered multilateral indices to the change in the machine learning method in one-stage classification procedure, i.e. with no use of text mining techniques as a first step (see "Appendix 3"). As it was above mentioned, random trees (XGBoost algorithm) were used in the analysis, which results from the fact that the highest level of accuracy was obtained for this method. However, other machine learning methods were also included in the experiments (United Nations Economic Commission for Europe 2021): naive Bayes (NB), k-Nearest Neighbours for k=10 (kNN) and Support Vector Machines (SVM) provided slightly less effective classification models (see Table 5). When examining the sensitivity of price indices to the choice of classification method, it was established for each data set studied that a target index is a multilateral index computed on a training set of correctly classified products from the whole time interval (30% of products from the interval: December 2019 to December 2020). Then, for each set of products obtained after using a separate classification method, the corresponding multilateral indices were determined. Finally, the designated indices were compared with the target index using the compare_to_target() function from the PriceIndices package. Resulting mean absolute differences for all product groups are presented in Tables 6, 7, 8 and 9. As one can see, the differences for the given machine learning method are similar, regardless of the choice of the multilateral index. It suggests that the sensitivity of multilateral indices to the choice of the classification method is comparable. However, the choice of the classification method itself is of course important here. The higher the percentage of incorrectly classified products by a given method, the greater the differences between the indices determined on the set of products classified by this method and on the set of correctly classified products. We believe that this thread requires in-depth further research.

Conclusions
The proposed, general class of multilateral indices (GS-GEKS), on the one hand, is based on the idea of the GEKS method, and, on the other hand, it differs crucially from this method due to the assumptions concerning the base formula of the index. Although the GS-GEKS class indices do not require the base index used in their body to be superlative (or even symmetric), as it is shown (Theorem 1), multilateral indices of this type have almost all the required properties, including the restrictive identity test. Moreover, the two specific cases of this general class of multilateral indices proposed in the paper, i.e. the GEKS-L and GEKS-GL indices (see Sect. 5), after certain certain normalisation meet the lower and upper bound test. Both the empirical and simulation studies confirmed that the two proposed indices behave rationally, and any differences in relation to the other considered multilateral indices appear only with large variability of quantity in homogeneous groups of products. Quite surprisingly, the price volatility, price-quantity correlation and product life expectancy did not play a significant role in the empirical study as determinants of differences between multilateral indices (see Sect. 6). Moreover, in the empirical study (see Sect. 6), we found that the computation times needed for the GEKS-L and GEKS-GL indices are noticeably shorter compared to most other multilateral indices (the exception is the SPQ index, which, however, does not take into account the entire time window). It seems that the sensitivity of GEKS-L and GEKS-GL methods to the choice of the product classification method is quite similar to analogous sensitivity concerning remaining multilateral indices. We also emphasize that although the base formula for the GEKS-L and GEKS-GL indices is the Laspeyres index and the geometric Laspeyres index, respectively (neither of them is superlative), the GEKS-L and GEKS-GL indices can be treated as a generalization of the Fisher and Törnqvist indices, which are superlative (see Sect. 5). The paper shows that the general nature of the GS-GEKS class allows the construction of further, theoretically interesting formulas of multilateral price indices (see Remark in Sect. 5.2). Although these indices were not the subject of the study in this article, they retain the properties of GS-GEKS class indices and are therefore an interesting research direction for the future. It should also be noted that both the previously known multilateral indices and the new indices proposed and discussed in the paper are implemented in the PriceIndices R package (Białek 2021), and thus the reader can verify their usefulness on their own data sets.

Appendix 1: Tests for multilateral indices
Let P and Q denote all prices and quantities observed in the time interval [0, T], i.e. P = [p 0 , p 1 , … , p T ] , Q = [q 0 , q 1 , … , q T ] , where p t and q t mean the vector of prices and the vector of quantities of products sold at time t, respectively. Let us denote by P 0,t (P, Q) the considered multilateral price index defined for the entire time window [0, T]. The list of potential tests for that index is as follows.

Transitivity
The transitivity means that P 0,t (P, Q) = P 0,s (P, Q)P s,t (P, Q) for any 0 ≤ s < t ≤ T.

Identity
This property means that the index equals identity if all prices revert back to their initial level, i.e. if it holds that p t i = p 0 i for i ∈ G 0,t then P 0,t (P, Q) = 1 . We assume here that the item universe is the same at periods 0 and t.

Multi period identity test
This property means that if all prices and quantities revert back to their initial level, the chained index will equal the unity, i.e. if it holds that p t i = p 0 i and q t i = q 0 i for i ∈ G 0,t then we obtain P 0,1 (P, Q) × P 1,2 (P, Q) × ⋯ × P t−1,t (P, Q) = 1 . We assume here that the item universe is the same at periods 0 and t.

Upper bound test
If G 0 ⊂ G t and p t i ≤ p 0 i for all i ∈ G 0 , then P 0,t (P, Q) ≤ 1.

Lower bound test
If G t ⊂ G 0 and p t i ≥ p 0 i for all i ∈ G t , then P 0,t (P, Q) ≥ 1.

Responsiveness test
then P 0,t (P, Q) cannot always equal one, regardless of sets: G 0 ⧵G t and G t ⧵G 0 .

Price proportionality
If all prices are proportional in compared periods 0 and t, i.e. p t i = kp 0 i for all i ∈ G 0,t and some positive k, then the price index depends only on this proportion: P 0,t (P, Q) = k . We assume here that the item universe is the same at periods 0 and t.

Homogeneity in quantities
Rescaling the quantities in any s-th period does not influence on the price index, i.e. for any positive k it holds that P 0,t (P, q 0 , … , kq s , … , q t ) = P 0,t (P, q 0 , … , q s , … , q t ).

Homogeneity in prices
Rescaling the prices in the current period changes the price index by the same proportion, i.e. for any positive k it holds that P 0,t (p 0 , p 1 , … , kp t , Q) = kP 0,t (p 0 , p 1 , … , p t , Q).

Transitivity
Let us consider such periods s and t from the time window [0, T] that 0 ≤ s < t ≤ T . We obtain

Identity
Let us assume that G 0 = G t = G 0,t and p t i = p 0 i for i ∈ G 0,t . We have

Multi period identity test
Let us assume that that p t i = p 0 i and q t i = q 0 i for i ∈ G 0,t = G 0 = G t . We obtain Please note that this proof does not require the condition q t i = q 0 i .

Responsiveness test
Let us assume that G 0 ≠ G t and p t i = p 0 i for all i ∈ G 0,t . Since G 0 ≠ G t , we know that for at least one period 0 we have G 0 ,t ≠ G 0 ,0 ∩ G 0 ,t and, from our initial assumption (see Sect. 2), we have that G ,0 ∩ G ,t ≠ � for any . From the assumption R5 (see Sect. 4), we In a similar way, it can be shown that f G 0 ,0 (q 0 , p 0 , p 0 ) ≠ f G 0 ,0 ∩G 0 ,t (q 0 , p 0 , p 0 ) . Thus, in general, it holds that Since we assume that prices at compared time moments are identical, i.e. p t i = p 0 i , we obtain finally that:

Continuity, positivity and normalization
Continuity, positivity and normalization are a consequence of the requirement R1.

Price proportionality
Assumption that the item universe is the same at periods 0 and t means that G 0 = G t = G 0,t and also G ,0 = G ,t for any . Let us assume that p t i = kp 0 i for all i ∈ G 0,t and some positive k. As a consequence, we obtain From the requirement R2 (see Sect. 4), we have that

Lower bound test
Let us assume that the requirement R6 is additionally satisfied (see Sect. 4). Let us also assume that G t ⊂ G 0 , which leads to the conclusion that G ,t = (G ,0 ∩ G ,t ) ⊂ G ,0 for any . From R6, we obtain Due to the assumption that p t i ≥ p 0 i for all i ∈ G t ⊂ G 0 , from R7, we have that which means that P 0,t GS−GEKS ≥ 1.