Abstract
A substitution of the traditional data sources allows National Statistical Institutes to collect large samples of product items with corresponding prices recorded over more frequent time periods. For some product categories, a frequent replacement of product items requires a dynamic approach to define a product sample that can be achieved by a monthly update of the sample. In a month-on-month comparison of price changes, bilateral indices must be chained that can lead to a chain drift bias. Multilateral methods are designed to reduce the chain drift, albeit not completely eliminate it. A disadvantage of multilateral methods is its ambiguous interpretability and a revision problem. The main objective of this paper is to show an implementation of multilateral methods for official price statistics in practice. The multilateral GEKS method is decomposed into elementary components that identifies the effect of successive price relatives used in its compilation. An importance of the decomposition is shown when interpreting the GEKS index in the context of official price statistics. We demonstrate that it can lead to an inaccurate public dissemination of the consumer price index. Moreover, we propose a new method for the splice extension that is required to avoid the revision of previously published indices. The method uses a Stein-type shrinkage estimator for the splice extension that is applied additively to the last published index. In the empirical study, we use web scraped data, which are obtained from the Slovak market, for comparison of bilateral and multilateral indices.
Similar content being viewed by others
1 Introduction
The collection of product prices through web scraping is a new data source that is explored by National Statistical Institutes (NSIs). Web scraping allows obtaining prices of individual product items on daily basis that can be offered by several online retailers through a comparison website platform. Daily web scraped prices require an aggregation on a monthly level. Since the information on sales for product-offers is not available, the aggregation can only be obtained by applying an unweighted arithmetic average or a geometric average for each product item in a given month, as proposed in the guide by Eurostat (2020). The guide mainly discusses the legal aspects and information technology requirements of web scraping for official price statistics. The objective of aggregation of web scraped prices is limited. For example, it does not discuss a treatment of web scraped prices downloaded from website platforms that compare product-offers from several online retailers, where we observe multiple prices for the same product item in a given day. Moreover, no guidance is suggested for price filters that is required to clean price data from anomalous or extreme values. Daily web scraped prices can be very volatile due to sales offers and competitive strategies of online retailers.
The advantages of collecting prices through web scraping compared to semi-automatic detection (“copy and paste”) are studied by Polidoro et al. (2015) using the sample of consumer electronics and airfares for Italian harmonised index of consumer prices (HICP) compilation. The authors of the paper measure the advantages of web scraping by the amount of the workload that can be saved, and, based on the results, they conclude that by the adoption of web scraping, NSIs can increase a number of elementary product items in the sample and save a considerable amount of time. Moreover, a direct control and rules that can affect prices can be exerted on the selection of elementary product items and periods with sales-offers.
The paper by Glaser-Opitzová (2019) studies the use of price data in the context of consumer price indices that are collected directly from high street retailers, also called scanner or transaction data. The advantage of scanner data is that they contain sales and quantities sold for each elementary product item, and, therefore, they provide a more accurate economic reflection of the consumer behaviour. In this case, the consumer price index only captures period-on-period price changes of elementary product items that were sold, which are weighted by its quantities sold, in the given periods. Web scraping only collects prices of product-offers that do not contain any information on sales.
However, a substantial share of consumer purchasing, for some products, is carried out online. Hence, NSIs should consider web scraping as a tool for data collection of prices to more accurately reflect the consumer behaviour in price statistics.
Through web scraping, this study obtains a sample of all product items that are available on offer. For product categories with a high attrition rate, NSIs need to select an appropriate price index that captures all new and disappearing product items in monthly samples for estimating the average price changes, which reflect the economic activity more accurately. The compilation of consumer price indices by multilateral methods offers to create more representative monthly samples of product items for product categories with a high attrition rate.
The main objective of this paper is to demonstrate and discuss the application of multilateral methods from the implementation perspective. Particularly, a theoretical framework of the Gini–Eltöto–Köves–SzulcFootnote 1 (GEKS) method, its compilation technique of bilateral indices and interpretation, is studied in details, which is needed when implementing the GEKS index in the production environment by NSIs. Moreover, the study proposes a new method for the splice extension that is required to avoid the revision of previously published indices. In the empirical study, we recast web scraped data of product categories washing machines, dryers and dishwashers from the Slovak market as in the paper by Knížat (2023), scraped by the Statistical Office of the Slovak Republic.
The Statistical Office of the Slovak Republic and the Faculty of Economic Informatics at the University of Economics in Bratislava conducted numerous researches on using web scraped and scanner data in the compilation of consumer price indices. The overall target of the project’s realisation is to test alternative options for data sources for price statistics that are currently used at the Statistical Office of the Slovak Republic. The parallel research consists of testing various economic approaches for estimating price indices that can potentially replace the bilateral consumer price indices, which are used with traditionally collected price data.
The paper is organised as follows: Sect. 2 summarises the current research conducted on multilateral methods in price indices and the most recent guidelines published by the European Statistical Agency (Eurostat). Section 3 outlines a theoretical framework of the compilation method for the GEKS index for both rolling and expanding window approaches. Furthermore, it shows two common extension methods that are used to correct the revision problem of the multilateral methods. We also propose a new extension method that uses a Stein-type shrinkage estimator to splice the new index onto the already published indices. Section 4 shows the application of the theoretical framework on web scraped data from the Slovak market and discusses the corresponding results. Section Conclusions restates the main objective of this paper and outlines its achievements. It provides a summary and recommendations in the context of official price statistics that can be useful for NSIs.
2 Literature survey
In the recently published guide by Eurostat (2022), multilateral methods for estimating consumer price indices are described in details. The guide demonstrates a methodology of various multilateral methods, which compares price changes between more than two time periods simultaneously, from the practical perspective. Its application is centred on using scanner data but can also be adapted, in some cases, for web scraped data. Multilateral methods have been previously applied for comparing price changes across countries or regions, first proposed by Ivancic et al. (2011) for the use across time periods. Since the proposal, they have been widely studied by many academicians and practitioners (de Haan 2008; de Haan and Krsinich 2012; Chessa 2016, etc.).
The paper by de Haan and Hendriks (2013) proposes an estimation of consumer price indices with daily frequency using web scraped prices. It states that daily price indices may be quite volatile and its practical use is questionable since NSIs publish consumer price indices on monthly basis. It points out an important issue regarding the treatment of prices that are in regular and sales periods, i.e., regular prices could stay constant over time but sales prices show an upward trend. If consumers prefer to purchase products in the sale period, which is the case shown in de Haan (2008) on supermarket scanner data, it would be more appropriate to consider a change in sales prices in the index compilation. With regards to multilateral methods, the authors demonstrate that the GEKS-Jevons index and the unweighted Time-Product Dummy (TPD) index are likely to have a similar trend, with the TPD index perhaps being a little smoother.
The paper by Glaser-Opitzová (2019) (in Slovak language) discusses an implementation of scanner data for price statistics. It presents a current state of compiling prices by the Statistical Office of the Slovak Republic, outlines characteristics of scanner data and shows various methodologies for calculating the price index. In the empirical study for the product category of refrigerators, the paper by Knížat and Glaser-Opitzová (2023) (in Slovak language) compares bilateral and multilateral price indices, with no conclusion drawn on what is the most appropriate price index. The paper by Knížat (2023) shows various possibilities for aggregating daily web scraped prices on monthly level and demonstrates its application on product categories of washing machines, dryers and dishwashers scraped by the Statistical Office of the Slovak Republic. It also makes a comparison of the results for bilateral and multilateral price indices. However, it does not lead to the conclusion that would suggest the most suitable price index for these particular product categories.
The recent paper by Białek (2022) discusses advantages of scanner data for the calculation of consumer price indices. It concludes that it is mainly the fact that scanner data contain transaction information of purchases for each product item in the data set. The paper states that a key challenge is the choice of the index formula that should reduce a chain drift. It presents various multilateral methods for the estimation of price indices and proposes two new multilateral indices which resemble the GEKS method. The author shows that the proposed new multilateral indices meet the identity test that is not present in the GEKS index.
Multilateral methods suffer from the revision problem that occurs when the new index is calculated, i.e., with the new index, all previous indices in the window are recalculated and changed. The change of the previously published indices is not a desirable phenomenon for NSIs. It can be prevented by splicing the new index onto the already published indices. The paper by Chessa (2019) compares the results of various splicing methods and points out that a longer time window could reduce a difference between individual splicing methods. It proposes to use a fixed base splicing method that is more resistant to a drift bias.
The authors of this paper assume that the reader has a solid understanding of the theory of price statistics. The reader that requires a background in the theory of price indices is recommended to study a manuscript by Balk (2008). Moreover, the manual by ILO et al. (2004) provides a comprehensive overview of the theoretical framework of both bilateral and multilateral methods used for the estimation of price indices.
3 Research methodology
The estimation of consumer price indices (CPIs) by multilateral methods compares price changes for more than two time period simultaneously over a specified time window for the given set of product items. In the recently published Eurostat guide on multilateral methods in the harmonised index of consumer prices, Eurostat (2022), three fundamentally different methods are presented.
The fixed-effects regression model is used for estimating a time-product dummy (TPD) index, where the estimated coefficients of the time-dummy variable corresponding to each time period are the price indices. A similar version of the regression model is used to estimate the hedonic index which utilises characteristic parameters of individual product items as the regressor variables. The other proposed method is the Geary–Khamis (GK) index that is a quality-adjusted value index. The method of the estimation with different versions of the GK index are described in details in the paper by Chessa (2016).
The third presented method, the GEKS method, is based on compilation of bilateral indices using a geometric average for a given time window. The bilateral indices are calculated between specific periods in the time window and, therefore, matched-item samples are needed between these successive periods. In this section, the authors demonstrate its compilation for the bilateral Jevons index which belongs to a family of the unweighted indices. The Jevons index satisfies a condition of the transitivity that is a requirement for the GEKS method, refer to Eurostat (2022) for details. A thorough understanding of the GEKS method is required by NSIs since its individual components might have a profound effect on the published consumer price index. The generalisation of the GEKS compilation method to other multilateral methods is not possible since its estimation and interpretation can be fundamentally different.
3.1 Multilateral price indices
Multilateral methods can be applied by two different approaches, the rolling window or the expanding window. We recast and study both approaches applied on the GEKS method. Both approaches can be generalised to other multilateral methods.
The current research, discussed in Sect. 2, mostly concerns the application of the rolling window approach. The guide on multilateral methods by Eurostat (2022) also demonstrates the application of the rolling window approach and only briefly mentions the expanding window approach. However, the compilation of the multilateral index by the expanding window is fundamentally different compared to the rolling window, and, therefore, it can produce significantly different price indices.
We use R software to code the GEKS method. For the rolling window approach, we mostly utilise the package IndexNumR created by White (2022). The expanding window approach, including the proposed additive splicing method, are coded by the authors.
3.1.1 Rolling window GEKS method
We start with an introduction and a thorough analysis of the compilation technique for the GEKS method by the rolling window approach. The GEKS index compares the average price changes between the base period 0 and the current period t. It can be defined as (Eurostat 2022):
where \(P^{l,t}\) are bilateral indices calculated between successive periods l and t and T is a time window, usually defined to include a full year T = 12, i.e., a 13-month time window.Footnote 2 A sample of product items must be matched between these successive periods l and t in order to calculate the bilateral index. It implies that for l = 0 and l = t, \(P^{0,0} = P^{1,1} = 1\). The requirement of the GEKS method is that the bilateral index, used in its compilation, must satisfy a criterion of the transitivity Eurostat (2022), \(P^{l,t} = \frac{1}{{P^{l,t} }}\).
In the unweighted case, when using web scraped data, the only bilateral index satisfying this criterion is the Jevons indexFootnote 3 that is defined as follows (Eurostat 2022):
where \({p}_{i}^{0}\) and \({p}_{i}^{t}\) are prices of product items at the base period 0 and at the current period t, respectively, for the product items \(i=1,\dots ,N\). A sample of product items between two periods 0 and t is denoted as N0,t. Substituting (2) in (1) and setting T = 12, the GEKS-Jevons index between the base period 0 and the current period t is compiled as:
In the compilation of (3), for t = 1, …, T, we need to calculate a matrix of \(13\times 13\) Jevons indices between successive periods l and t and then compile the GEKS index by a geometric average for each period t. To demonstrate the compilation for t = 1, the GEKS-Jevons index between periods 0 and 1 is compiled as:
Clearly, if (4) only includes a sample of product items that are matched between periods 0 and 1, it would also set matched-item samples for both terms in each square bracket, and, hence, prices of product items \(p_{i}^{2} ,p_{i}^{3} , \ldots ,p_{i}^{12}\) would cancel each other out. This would lead that (4) simplifies to Jevons index in (2), \(I^{0,1} = \left( {\mathop \prod \limits_{i = 0}^{0,1} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{0} }}} \right)^{{\frac{13}{{N^{0,1} }}}} } \right)^{\frac{1}{13}} = P^{0,1}\). Similarly for periods t = 2, …, 12, if we assume matched-item samples between periods 0 and 2, 0 and 3, …, 0 and 12, we get the following Jevons indices: \(I^{0,2} = \left( {\mathop \prod \limits_{i = 0}^{{N^{0,2} }} \left( {\frac{{p_{i}^{2} }}{{p_{i}^{0} }}} \right)^{{\frac{13}{{N^{0,2} }}}} } \right)^{\frac{1}{13}} = P^{0,2} ,I^{0,3} = \left( {\mathop \prod \limits_{i = 0}^{{N^{0,3} }} \left( {\frac{{p_{i}^{3} }}{{p_{i}^{0} }}} \right)^{{\frac{13}{{N^{0,3} }}}} } \right)^{\frac{1}{13}} = P^{0,3} , \ldots ,I^{0,12} = \left( {\mathop \prod \limits_{i = 0}^{{N^{0,12} }} \left( {\frac{{p_{i}^{12} }}{{p_{i}^{0} }}} \right)^{{\frac{13}{{N^{0,12} }}}} } \right)^{\frac{1}{13}} = P^{0,12}\), respectively.
Hence, a difference between the Jevons and the GEKS index is driven by the unmatched product items between periods 0 and t. In (4), these unmatched product items enter the compilation from the third term as \(\left[ {\frac{{p_{i}^{2} }}{{p_{i}^{0} }} \times \frac{{p_{i}^{1} }}{{p_{i}^{2} }}} \right], \left[ {\frac{{p_{i}^{3} }}{{p_{i}^{0} }} \times \frac{{p_{i}^{1} }}{{p_{i}^{3} }}} \right], \ldots ,\left[ {\frac{{p_{i}^{12} }}{{p_{i}^{0} }} \times \frac{{p_{i}^{1} }}{{p_{i}^{12} }}} \right]\) within each corresponding square bracket term.
The GEKS-Jevons index in (4), compared between periods 0 and 1, is also affected by average price changes between periods 0 and 2, 2 and 1, 0 and 3, 3 and 1, …, 0 and 12, 12 and 1, respectively. Clearly, the index would not be interpretable from an economic point of view.Footnote 4 However, in the production environment, the first published GEKS index is at t = T + 1, which only compares the current period with previous periods. Noting that the rolling window of 13-months is shifted by 1 month forward so t = 1, …, T, T + 1, the GEKS-Jevons index between the base period 1 and the first published period 13 can be expressed as:
Similarly, in (5), a difference between the Jevons and the GEKS index is driven by the unmatched product items between periods 1 and 13, \(\left[\frac{{p}_{i}^{2}}{{p}_{i}^{1}}\times \frac{{p}_{i}^{13}}{{p}_{i}^{2}}\right], \left[\frac{{p}_{i}^{3}}{{p}_{i}^{1}}\times \frac{{p}_{i}^{13}}{{p}_{i}^{3}}\right],\dots ,\left[\frac{{p}_{i}^{12}}{{p}_{i}^{1}}\times \frac{{p}_{i}^{13}}{{p}_{i}^{12}}\right]\). Under the unmatched-item sample assumption, the GEKS index at t = 13 is affected by average price changes in previous periods. This implies that, from the practical perspective, NSIs would encounter challenges in the interpretation of period-on-period average price changes. For example, if the average price change between compared periods 1 and 13 (1 = January 2021 and 13 = January 2022) was mainly driven by the unmatched product items compared between periods 1 and 3 and 1 and 10, the statement would read: The year-on-year inflation in January 2022 of 2% was mainly driven by the average price increase of product items between January 2021 and March 2021 and January 2021 and October 2021.
It is evident that the above statement does not make any economic sense. Therefore, with the implementation of the GEKS method for official price statistics, NSIs require a fundamental amendment of their methodological framework, including the change of the interpretation of consumer price indices. A statement of the simple period-on-period average price changes might no longer be valid.
The other drawback of the GEKS method, and all other multilateral methods, is that the compilation of the new index at T + 1 causes a revision of previous indices in the window. This can be overcome by splicing the new index at T + 1 onto the indices compiled in the previous window, which can also be interpreted as redefining the base period back to 0 in the case of the rolling window approach. The use of splicing methods is discussed in Sect. 3.2.
3.1.2 Expanding window GEKS method
In the expanding window approach, the time window T is not fixed but expands, so t = T and the base period always remains at t = 0. It implies that for the first time period t = 1, the GEKS-Jevons index simplifies to the Jevons index.
For the second period t = 2, so T = 2 with a 3-months window, a compilation of the GEKS-Jevons index is expressed as follows:
And so on for t = 3, …, 12. The window can be reset at t = 12, or it can further be expanded until 25-months. Similar conclusions for the matched and unmatched product items can be drawn as in the rolling window approach. The advantage of the expanded window approach could be its interpretation since it only compares average price changes for periods in the same year, if it is reset at t = 12. In the empirical analysis in Sect. 4, we test the application of both approaches using web scraped prices.
Again, a problem of the revision occurs since, for example, in the case of t = 2, the index at t = 1 is recalculated. As mentioned previously, this can be overcome by the application of the splice extension. In the following section, we outline three splicing methods that can be adapted to both rolling and expanding window approaches.
3.2 Splicing methods
To overcome the problem of the revision of previously compiled multilateral indices, we recast two common splicing methods that can be adapted to both rolling and expanding window approaches. Moreover, we propose a new splicing method that uses a shrinkage estimator for creating a link between the new index and the previously compiled index series.
The movement splice uses only a previously calculated index for period T as the link period for the new T + 1 index. For both rolling and expanding window approaches, it can be defined as (Eurostat 2022):
The mean splice takes into account all previously compiled index series as link periods for the new T + 1 index. For the rolling window approach, the new index T + 1, using a 13-months window, is spliced as:
Equation (8) is a geometric average of indices compiled in the previous window that are weighted by the ratio of the new and previous indices compiled in the new window. Equation (8) can similarly be adopted to the case of the expanded window approach, where the number of terms in the round brackets increases with T. However, for the expanded window approach, this could create an overly smoothed index series, which is examined in the empirical study in Sect. 4.
We propose a new splicing method that is derived based on the Stein-type shrinkage estimator (Stein 1975). The Stein-type shrinkage estimator was originally developed to correct a sample covariance matrix bias for high-dimensional data by shrinking its largely dispersed eigenvalues, refer to Tong et al. (2014) for more details. The new splice is additive and it can be defined as:
where \(\lambda^{1,T + 1}\) is a shrinkage estimator. In the rolling window approach, we propose the following shrinkage estimator which is an adjusted version presented in the paper by Tong et al. (2014):
In the denominator of (10), the GEKS indices compiled in the previous window are divided by a difference between the new \(I^{1,T + 1}\) index and the previous indices \(I^{1,T + 1 - t}\) compiled in the new window which is then summed up. The nominator of (10) is kept fixed as the window length. It can also be adapted to the expanding window approach, where the number of terms in the denominator expands correspondingly with the expanding window length T.
The splicing method in (9) significantly differs from other two splicing methods since it is additive rather than multiplicative. It should be noted that this is a first time to apply a new type of the additive splicing method. It requires further theoretical and analytical research that is a scope of another research study.
Furthermore, it is a well-known fact that after the application of the splice, multilateral indices do not meet the criterion of transitivity \(I^{0,t} \ne \frac{1}{{I^{0,t} }}\). In the case of the bilateral index, this violation would normally invalidate the use of the index for official price statistics.
4 Results and discussions
We use web scraped data in the empirical study of the theoretical framework outlined in the previous sections. The daily web scraped prices of product categories washing machines, dryers and dishwashers (all in ECOICOP 5-digit: 05.3.1.2, refer to Eurostat, RAMON) are available from 01 December 2019 until 31 December 2021. The information was scraped from the website https://www.heureka.sk/, which is an online platform that offers various products on sale from different retailers. We collect multiple daily prices for each product item for a particular day.
Hence, before the compilation of price indices, an aggregation of daily prices on a monthly level is required. We use the geometric average for aggregation of product items’ prices, first on a daily and then on a monthly level. The aggregation by the geometric average reduces sometimes undesirable daily price volatility. For more details related to analysis of daily price volatility and the aggregation of daily prices refer to the paper by Knížat (2023) that uses the same set of web scarped data in its case study.
The following figures display the consumer price indices for rolling and expanding window approaches, respectively, of the above mentioned product categories.
Figure 1 shows the consumer price indices for the product category washing machines with a base period December 2020. Noting that the rolling window GEKS-Jevons index is rebased to December 2020Footnote 5 since its compilation requires a period of 13-months prior to the implementation date, which is also shown in Fig. 1 starting from December 2019. The fixed base Jevons and the chained Jevons index are calculated from December 2020 as a base period. It is evident that the chained Jevons index suffers from the downward chain drift when compared to the fixed base Jevons index. The GEKS-Jevons indices extended by the movement and mean splices are very similar, staying approximately in the middle between the fixed base Jevons and the chained Jevons index. The proposed additive splice method in the GEKS-Jevons index does not always follow the same trend as the other indices.
Similarly, Fig. 2 shows the consumer price indices for the product category washing machines but with an expanding window when compiling the multilateral GEKS-Jevons indices. The expansion of T is carried out from January 2020 until December 2021 over a 25-months window, with December 2019 a base period. The mean splicing extension method of the GEKS-Jevons index is not suitable since it creates an overly smoothed index series. The downward chain drift of the chained Jevons index starts from April 2021, it is not present prior to this date. This can indicate that a stability of the product items in monthly samples could play an important role in the drift, i.e., the addition of new items or the attrition of old items might exacerbate the drift. The GEKS-Jevons index extended by the proposed additive splice stays slightly above the other splicing methods. It follows the trend of other indices, except of two peaks in June 2020 and March 2021.
Overall, the expanding window GEKS-Jevons index with the movement splice extension seems to be closer to the fixed base Jevons index as compared to the rolling window GEKS-Jevons indices. This could indicate that it further reduces a downward drift which is a phenomenon when using chained indices.
Figure 3 shows the consumer price indices for the product category dryers, a base period and rebasing are defined as in Fig. 1. Similar conclusions as in Fig. 1 can be drawn, except that the rolling window GEKS-Jevons index extended by the additive splice suffers from a significant upward drift.
Figure 4 shows the consumer price indices for the product category dryers with an expanding window when compiling the multilateral GEKS-Jevons indices, a base period and expansion window are defined as in Fig. 2. Again, the mean splice method is not suitable for the extension. Similar remarks for the proposed additive splicing method as in Fig. 2 can be drawn. It suffers from two “high-peak” periods which is due to a sensitivity to the difference between the current and some of the previous indices which is in the denominator of (10), i.e., a large difference might cause a jump in the spliced index.
Figure 5 shows the consumer price indices for the product category dishwashers, a base period and rebasing are defined as in Fig. 1. Again, the chained Jevons index has a tendency to drift downward as compared to the fixed base Jevons index, albeit with a lower magnitude than in the previous product categories. The movement and mean splicing methods are close to the fixed base Jevons index, with the last three index series staying above it. The proposed additive splicing method shows irregular behaviour.
Figure 6 shows the consumer price indices for the product category dryers with an expanding window when compiling the multilateral GEKS-Jevons indices, a base period and expansion window are defined as in Fig. 2. The proposed additive splicing method shows similar trend as other displayed indices and it mostly stays slightly above the fixed base Jevons index. Similar conclusions can be drawn as in the expanding window approach for two previous product categories.
In general, we can conclude that the expanding window GEKS-Jevons index, extended by the movement splice, have a tendency to further reduce the downward chain drift when compared to the rolling window GEKS-Jevons index. An important observation is that the downward chain drift becomes more evident from the year 2020, which covers the economic period affected by the lockdown of physical outlets. In this period, an online shopping had been its substitute, which might have caused more volatile replacement rates in the monthly samples of product items.
The proposed additive splicing method shows some irregular behaviour, particularly, when used in the extension of the rolling window GEKS-Jevons index. It is mainly affected by large differences between the current and previous index series, a denominator of (10). Therefore, a further analytical study of the shrinkage estimator is required with possible adjustments that would correct the aforementioned sensitivity. However, it shows a potential to further reduce a downward chained drift since it has a tendency to be close to the fixed base Jevons index.
5 Conclusions
The main objective of this paper is to show an intricacy of multilateral methods in the compilation of consumer price indices when used for official price statistics. Particularly, we provide a comprehensive overview of the GEKS method that compiles bilateral Jevons indices calculated for each successive periods in the defined window. The paper shows that the published GEKS-Jevons index at the current period can be affected by previous bilateral Jevons index series in the given window. It follows that a challenge for NSIs is to publicly disseminate such consumer price indices in the context of the fundamental definition of period-on-period average price changes. The solution is an amendment of the methodological framework for official price statistics that would include changing the definition of the consumer price index. However, the interpretation of different multilateral indices varies and it is not always easy to understand by a non-expert.
In the empirical study, we show the application of the GEKS-Jevons index for both rolling and expanding window approaches using web scraped data for three different product categories. The expanding window approach seems to be more likely to further reduce a downward chain drift with comparison to the rolling window approach. The paper shows that the decision to choose a suitable price index could be different for each product category. It should be mainly driven by the replacement rate in the monthly samples of product items and the empirical testing of different price indices for the particular product category.
Moreover, a revision problem of already published index series when compiling a new index at T + 1 using multilateral methods brings additional computational and practical challenges. There are numerous options for splicing methods, which are used to correct the revision problem, but it is not always straightforward which one is the most appropriate. In the expanding window approach, the movement splice is preferable that is also easy to calculate and interpret since it only uses the latest published index in the extension. Moreover, we propose a new splice that is of the additive character, which might have a potential to further reduce a downward chain drift. However, since it is new in the context of the extension of multilateral price indices, it requires further theoretical and analytical research.
We conclude that new data sources for price statistics bring the opportunity to capture the price changes in the economy more accurately, and, therefore, a replacement of bilateral indices, in some cases, by multilateral methods for the calculation of consumer price indices is required. Again, we emphasise that a thorough data analysis of each product category is required to assess whether the application of multilateral methods is the most suitable.
The research in this paper can be extended to similarly evaluate the application of other multilateral methods in practice. The estimation methodology of the Geary–Khamis and Time-Product Dummy index is significantly different compared to the compilation technique of the GEKS method. Moreover, additional splicing methods that are adaptable for both rolling and expanding window approaches can be explored.
Notes
The guide by Eurostat (2022) also proposes to test a 25-months time window that can further reduce a potential drift. However, to test the 25-months time window empirically, NSIs would need to collect web scraped prices for at least 3 years prior to the testing analysis.
The Jevons index was first proposed by Jevons (1865).
The GEKS method was originally developed for price comparisons across space (countries, regions etc.), where this interpretation would be more applicable (Eurostat 2022).
A rebasing of the GEKS index series is carried out by diving all indices by the December 2020 index.
References
Balk, B.M.: Price and Quantity Index Numbers: Models for Measuring Aggregate Change and Difference. Cambridge University Press, Cambridge (2008)
Białek, J.: Improving quality of the scanner CPI: proposition of new multilateral methods. Qual. Quant. (2022). https://doi.org/10.1007/s11135-022-01506-6
Chessa, A.G.: A new methodology for processing scanner data in the Dutch CPI. EURONA Eurostat Rev. Natl. Acc. Macroecon. Indic. 1, 49–70 (2016)
Chessa, A.G.: A comparison of index extension methods for multilateral methods. In: Paper Presented at the 16th Meeting of the Ottawa Group on Price Indices, Rio de Janeiro, Brasil (2019)
de Haan, J.: Reducing drift in chained superlative price indexes for highly disaggregated data. In: Paper Presented at the Eleventh Economic Measurement Group Workshop, Sydney, Australia (2008)
de Haan, J., Hendriks, R.: Online data, fixed effects the construction of high-frequency price indexes. In: Paper Presented at the Economic Measurement Group Workshop, Sydney, Australia (2013)
de Haan, J., Krsinich, F.: Scanner data and the treatment of quality change in rolling year GEKS price indexes. In: Paper Presented at the Eleventh Economic Measurement Group Workshop, Sydney, Australia (2012)
Eltetö, O., Köves, P.: On a problem of index number computation relating to international comparisons. Stat. Szle. 42, 507–518 (1964)
Eurostat: Guide on Multilateral Methods in the Harmonised Index of Consumer Prices [online]. Luxembourg: Publication Office of the European Union. https://ec.europa.eu/eurostat/documents/3859598/14503841/KS-GQ-21-020-EN-N.pdf/243796c9-f5ad-2155-e546-c94e17d9a7eb?t=1649074284236. Accessed 13 Feb 2023 (2022)
Eurostat: Practical guidelines on web scraping for the HICP. https://ec.europa.eu/eurostat/documents/272892/12032198/Guidelines-web-scraping-HICP-11-2020.pdf/. Accessed 13 Feb 2023 (2020)
Eurostat: RAMON—Reference and management of nomenclatures. https://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_NOM_DTL&StrNom=COICOP_5&StrLanguageCode=EN&IntPcKey=&StrLayoutCode=HIERARCHIC&IntCurrentPage=1. Accessed 30 Nov 2022
Gini, C.: On the circular test of index numbers. Metron 9, 3–24 (1931)
Glaser-Opitzová, H.: New data sources for price statistics and the methods of their processing. Slov. Štat. Demogr. 29(4), 49–66 (2019)
Heureka: https://www.heureka.sk/. Accessed daily for web scraping of prices from 01 December 2019 until 31 December 2021
ILO/IMF/OECD/UNECE/Eurostat/The World Bank: Consumer Price Index Manual: Theory and Practice. ILO Publications, Geneva (2004)
Ivancic, L., Diewert, W.E., Fox, K.J.: Scanner data, time aggregation and the construction of price indexes. J. Econom. 161(1), 24–35 (2011). https://doi.org/10.1016/j.jeconom.2010.09.003
Jevons, W.S.: On the variation of prices and the value of the currency since 1782. J. Stat. Soc. Lond. 28(2), 294–320 (1865)
Knížat, P.: Web scraped data in consumer price indices. Stat. J. IAOS 39, 203–212 (2023). https://doi.org/10.3233/SJI-220115
Knížat, P., Glaser-Opitzová, H.: Consumer price index from web-scraped data: analysis of specific product category. Slov. Štat. Demogr. 33(1), 37–49 (2023)
Polidoro, F., Gionnini, R., Lo Conte, R., Mosca, S., Rossetti, F.: Web scraping techniques to collect data on consumer electronics and airfares for Italian HICP compilation. Stat. J. IAOS 31, 165–176 (2015). https://doi.org/10.3233/sji-150901
Stein, C.: Estimation of a covariance matrix. In: Rietz Lecture, 39th IMS Annual Meeting, Atlanta, Georgia (1975)
Szulc, B.: Indices for multiregional comparisons. Prz. Stat. 3, 239–254 (1964)
Tong, T., Wang, C., Wang, Y.: Estimation of variances and covariances for high dimensional data: a selective review. Wires Comput. Stat. 6(4), 255–264 (2014). https://doi.org/10.1002/wics.1308
White, G.: IndexNumR: Index Number Calculation. R package version 0.5.0. https://CRAN.R-project.org/package=IndexNumR (2022)
Acknowledgements
This work was conducted as part of the European project “Dynamic Pricing Model” (311071AA56) that is supported by the European Union and co-financed through the European Regional Development Fund. The authors would also like to acknowledge a support of the grants by the Grant Agency of the Slovak Republic VEGA 1/0047/23 “The importance of spatial spillover effects in the context of the EU's greener and carbon-free Europe priority” and VEGA 1/0561/21 “The impact of the COVID-19 crisis on business demography and employment in the Slovak Republic and the EU”.
Funding
Open access funding provided by The Ministry of Education, Science, Research and Sport of the Slovak Republic in cooperation with Centre for Scientific and Technical Information of the Slovak Republic.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Peter Knížat. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Knížat, P., Glaser-Opitzová, H., Furková, A. et al. Multilateral indices in official price statistics and a new additive splicing method. Qual Quant (2024). https://doi.org/10.1007/s11135-024-01848-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s11135-024-01848-3