1 Introduction

The collection of product prices through web scraping is a new data source that is explored by National Statistical Institutes (NSIs). Web scraping allows obtaining prices of individual product items on daily basis that can be offered by several online retailers through a comparison website platform. Daily web scraped prices require an aggregation on a monthly level. Since the information on sales for product-offers is not available, the aggregation can only be obtained by applying an unweighted arithmetic average or a geometric average for each product item in a given month, as proposed in the guide by Eurostat (2020). The guide mainly discusses the legal aspects and information technology requirements of web scraping for official price statistics. The objective of aggregation of web scraped prices is limited. For example, it does not discuss a treatment of web scraped prices downloaded from website platforms that compare product-offers from several online retailers, where we observe multiple prices for the same product item in a given day. Moreover, no guidance is suggested for price filters that is required to clean price data from anomalous or extreme values. Daily web scraped prices can be very volatile due to sales offers and competitive strategies of online retailers.

The advantages of collecting prices through web scraping compared to semi-automatic detection (“copy and paste”) are studied by Polidoro et al. (2015) using the sample of consumer electronics and airfares for Italian harmonised index of consumer prices (HICP) compilation. The authors of the paper measure the advantages of web scraping by the amount of the workload that can be saved, and, based on the results, they conclude that by the adoption of web scraping, NSIs can increase a number of elementary product items in the sample and save a considerable amount of time. Moreover, a direct control and rules that can affect prices can be exerted on the selection of elementary product items and periods with sales-offers.

The paper by Glaser-Opitzová (2019) studies the use of price data in the context of consumer price indices that are collected directly from high street retailers, also called scanner or transaction data. The advantage of scanner data is that they contain sales and quantities sold for each elementary product item, and, therefore, they provide a more accurate economic reflection of the consumer behaviour. In this case, the consumer price index only captures period-on-period price changes of elementary product items that were sold, which are weighted by its quantities sold, in the given periods. Web scraping only collects prices of product-offers that do not contain any information on sales.

However, a substantial share of consumer purchasing, for some products, is carried out online. Hence, NSIs should consider web scraping as a tool for data collection of prices to more accurately reflect the consumer behaviour in price statistics.

Through web scraping, this study obtains a sample of all product items that are available on offer. For product categories with a high attrition rate, NSIs need to select an appropriate price index that captures all new and disappearing product items in monthly samples for estimating the average price changes, which reflect the economic activity more accurately. The compilation of consumer price indices by multilateral methods offers to create more representative monthly samples of product items for product categories with a high attrition rate.

The main objective of this paper is to demonstrate and discuss the application of multilateral methods from the implementation perspective. Particularly, a theoretical framework of the Gini–Eltöto–Köves–SzulcFootnote 1 (GEKS) method, its compilation technique of bilateral indices and interpretation, is studied in details, which is needed when implementing the GEKS index in the production environment by NSIs. Moreover, the study proposes a new method for the splice extension that is required to avoid the revision of previously published indices. In the empirical study, we recast web scraped data of product categories washing machines, dryers and dishwashers from the Slovak market as in the paper by Knížat (2023), scraped by the Statistical Office of the Slovak Republic.

The Statistical Office of the Slovak Republic and the Faculty of Economic Informatics at the University of Economics in Bratislava conducted numerous researches on using web scraped and scanner data in the compilation of consumer price indices. The overall target of the project’s realisation is to test alternative options for data sources for price statistics that are currently used at the Statistical Office of the Slovak Republic. The parallel research consists of testing various economic approaches for estimating price indices that can potentially replace the bilateral consumer price indices, which are used with traditionally collected price data.

The paper is organised as follows: Sect. 2 summarises the current research conducted on multilateral methods in price indices and the most recent guidelines published by the European Statistical Agency (Eurostat). Section 3 outlines a theoretical framework of the compilation method for the GEKS index for both rolling and expanding window approaches. Furthermore, it shows two common extension methods that are used to correct the revision problem of the multilateral methods. We also propose a new extension method that uses a Stein-type shrinkage estimator to splice the new index onto the already published indices. Section 4 shows the application of the theoretical framework on web scraped data from the Slovak market and discusses the corresponding results. Section Conclusions restates the main objective of this paper and outlines its achievements. It provides a summary and recommendations in the context of official price statistics that can be useful for NSIs.

2 Literature survey

In the recently published guide by Eurostat (2022), multilateral methods for estimating consumer price indices are described in details. The guide demonstrates a methodology of various multilateral methods, which compares price changes between more than two time periods simultaneously, from the practical perspective. Its application is centred on using scanner data but can also be adapted, in some cases, for web scraped data. Multilateral methods have been previously applied for comparing price changes across countries or regions, first proposed by Ivancic et al. (2011) for the use across time periods. Since the proposal, they have been widely studied by many academicians and practitioners (de Haan 2008; de Haan and Krsinich 2012; Chessa 2016, etc.).

The paper by de Haan and Hendriks (2013) proposes an estimation of consumer price indices with daily frequency using web scraped prices. It states that daily price indices may be quite volatile and its practical use is questionable since NSIs publish consumer price indices on monthly basis. It points out an important issue regarding the treatment of prices that are in regular and sales periods, i.e., regular prices could stay constant over time but sales prices show an upward trend. If consumers prefer to purchase products in the sale period, which is the case shown in de Haan (2008) on supermarket scanner data, it would be more appropriate to consider a change in sales prices in the index compilation. With regards to multilateral methods, the authors demonstrate that the GEKS-Jevons index and the unweighted Time-Product Dummy (TPD) index are likely to have a similar trend, with the TPD index perhaps being a little smoother.

The paper by Glaser-Opitzová (2019) (in Slovak language) discusses an implementation of scanner data for price statistics. It presents a current state of compiling prices by the Statistical Office of the Slovak Republic, outlines characteristics of scanner data and shows various methodologies for calculating the price index. In the empirical study for the product category of refrigerators, the paper by Knížat and Glaser-Opitzová (2023) (in Slovak language) compares bilateral and multilateral price indices, with no conclusion drawn on what is the most appropriate price index. The paper by Knížat (2023) shows various possibilities for aggregating daily web scraped prices on monthly level and demonstrates its application on product categories of washing machines, dryers and dishwashers scraped by the Statistical Office of the Slovak Republic. It also makes a comparison of the results for bilateral and multilateral price indices. However, it does not lead to the conclusion that would suggest the most suitable price index for these particular product categories.

The recent paper by Białek (2022) discusses advantages of scanner data for the calculation of consumer price indices. It concludes that it is mainly the fact that scanner data contain transaction information of purchases for each product item in the data set. The paper states that a key challenge is the choice of the index formula that should reduce a chain drift. It presents various multilateral methods for the estimation of price indices and proposes two new multilateral indices which resemble the GEKS method. The author shows that the proposed new multilateral indices meet the identity test that is not present in the GEKS index.

Multilateral methods suffer from the revision problem that occurs when the new index is calculated, i.e., with the new index, all previous indices in the window are recalculated and changed. The change of the previously published indices is not a desirable phenomenon for NSIs. It can be prevented by splicing the new index onto the already published indices. The paper by Chessa (2019) compares the results of various splicing methods and points out that a longer time window could reduce a difference between individual splicing methods. It proposes to use a fixed base splicing method that is more resistant to a drift bias.

The authors of this paper assume that the reader has a solid understanding of the theory of price statistics. The reader that requires a background in the theory of price indices is recommended to study a manuscript by Balk (2008). Moreover, the manual by ILO et al. (2004) provides a comprehensive overview of the theoretical framework of both bilateral and multilateral methods used for the estimation of price indices.

3 Research methodology

The estimation of consumer price indices (CPIs) by multilateral methods compares price changes for more than two time period simultaneously over a specified time window for the given set of product items. In the recently published Eurostat guide on multilateral methods in the harmonised index of consumer prices, Eurostat (2022), three fundamentally different methods are presented.

The fixed-effects regression model is used for estimating a time-product dummy (TPD) index, where the estimated coefficients of the time-dummy variable corresponding to each time period are the price indices. A similar version of the regression model is used to estimate the hedonic index which utilises characteristic parameters of individual product items as the regressor variables. The other proposed method is the Geary–Khamis (GK) index that is a quality-adjusted value index. The method of the estimation with different versions of the GK index are described in details in the paper by Chessa (2016).

The third presented method, the GEKS method, is based on compilation of bilateral indices using a geometric average for a given time window. The bilateral indices are calculated between specific periods in the time window and, therefore, matched-item samples are needed between these successive periods. In this section, the authors demonstrate its compilation for the bilateral Jevons index which belongs to a family of the unweighted indices. The Jevons index satisfies a condition of the transitivity that is a requirement for the GEKS method, refer to Eurostat (2022) for details. A thorough understanding of the GEKS method is required by NSIs since its individual components might have a profound effect on the published consumer price index. The generalisation of the GEKS compilation method to other multilateral methods is not possible since its estimation and interpretation can be fundamentally different.

3.1 Multilateral price indices

Multilateral methods can be applied by two different approaches, the rolling window or the expanding window. We recast and study both approaches applied on the GEKS method. Both approaches can be generalised to other multilateral methods.

The current research, discussed in Sect. 2, mostly concerns the application of the rolling window approach. The guide on multilateral methods by Eurostat (2022) also demonstrates the application of the rolling window approach and only briefly mentions the expanding window approach. However, the compilation of the multilateral index by the expanding window is fundamentally different compared to the rolling window, and, therefore, it can produce significantly different price indices.

We use R software to code the GEKS method. For the rolling window approach, we mostly utilise the package IndexNumR created by White (2022). The expanding window approach, including the proposed additive splicing method, are coded by the authors.

3.1.1 Rolling window GEKS method

We start with an introduction and a thorough analysis of the compilation technique for the GEKS method by the rolling window approach. The GEKS index compares the average price changes between the base period 0 and the current period t. It can be defined as (Eurostat 2022):

$$ I^{0,t} = \mathop \prod \limits_{l = 0}^{T} \left( {P^{0,l} \times P^{l,t} } \right)^{{\frac{1}{T + 1}}} $$
(1)

where \(P^{l,t}\) are bilateral indices calculated between successive periods l and t and T is a time window, usually defined to include a full year T = 12, i.e., a 13-month time window.Footnote 2 A sample of product items must be matched between these successive periods l and t in order to calculate the bilateral index. It implies that for l = 0 and l = t, \(P^{0,0} = P^{1,1} = 1\). The requirement of the GEKS method is that the bilateral index, used in its compilation, must satisfy a criterion of the transitivity Eurostat (2022), \(P^{l,t} = \frac{1}{{P^{l,t} }}\).

In the unweighted case, when using web scraped data, the only bilateral index satisfying this criterion is the Jevons indexFootnote 3 that is defined as follows (Eurostat 2022):

$$ P^{0,t} = \mathop \prod \limits_{i = 0}^{{N^{0,t} }} \left( {\frac{{p_{i}^{t} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,t} }}}} $$
(2)

where \({p}_{i}^{0}\) and \({p}_{i}^{t}\) are prices of product items at the base period 0 and at the current period t, respectively, for the product items \(i=1,\dots ,N\). A sample of product items between two periods 0 and t is denoted as N0,t. Substituting (2) in (1) and setting T = 12, the GEKS-Jevons index between the base period 0 and the current period t is compiled as:

$$ I^{0,t} = \mathop \prod \limits_{l = 0}^{T} \left( {\mathop \prod \limits_{i = 0}^{{N^{0,l} }} \left( {\frac{{p_{i}^{l} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,l} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{l,t} }} \left( {\frac{{p_{i}^{t} }}{{p_{i}^{l} }}} \right)^{{\frac{1}{{N^{l,t} }}}} } \right)^{\frac{1}{13}} . $$
(3)

In the compilation of (3), for t = 1, …, T, we need to calculate a matrix of \(13\times 13\) Jevons indices between successive periods l and t and then compile the GEKS index by a geometric average for each period t. To demonstrate the compilation for t = 1, the GEKS-Jevons index between periods 0 and 1 is compiled as:

$$ \begin{aligned} I^{0,1} = & \left( {\left[ {1 \times \mathop \prod \limits_{i = 0}^{{N^{0,1} }} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,1} }}}} } \right] \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{0,1} }} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,1} }}}} \times 1} \right] \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{0,2} }} \left( {\frac{{p_{i}^{2} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,2} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{2,1} }} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{2} }}} \right)^{{\frac{1}{{N^{2,1} }}}} } \right]} \right. \\ & \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{0,3} }} \left( {\frac{{p_{i}^{3} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,3} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{3,1} }} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{3} }}} \right)^{{\frac{1}{{N^{3,1} }}}} } \right] \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{0,4} }} \left( {\frac{{p_{i}^{4} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,4} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{4,1} }} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{4} }}} \right)^{{\frac{1}{{N^{4,1} }}}} } \right] \\ & \left. { \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{0,5} }} \left( {\frac{{p_{i}^{5} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,5} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{5,1} }} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{5} }}} \right)^{{\frac{1}{{N^{5,1} }}}} } \right] \ldots \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{0,12} }} \left( {\frac{{p_{i}^{12} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,12} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{12,1} }} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{12} }}} \right)^{{\frac{1}{{N^{12,1} }}}} } \right]} \right)^{\frac{1}{13}} . \\ \end{aligned} $$
(4)

Clearly, if (4) only includes a sample of product items that are matched between periods 0 and 1, it would also set matched-item samples for both terms in each square bracket, and, hence, prices of product items \(p_{i}^{2} ,p_{i}^{3} , \ldots ,p_{i}^{12}\) would cancel each other out. This would lead that (4) simplifies to Jevons index in (2), \(I^{0,1} = \left( {\mathop \prod \limits_{i = 0}^{0,1} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{0} }}} \right)^{{\frac{13}{{N^{0,1} }}}} } \right)^{\frac{1}{13}} = P^{0,1}\). Similarly for periods t = 2, …, 12, if we assume matched-item samples between periods 0 and 2, 0 and 3, …, 0 and 12, we get the following Jevons indices: \(I^{0,2} = \left( {\mathop \prod \limits_{i = 0}^{{N^{0,2} }} \left( {\frac{{p_{i}^{2} }}{{p_{i}^{0} }}} \right)^{{\frac{13}{{N^{0,2} }}}} } \right)^{\frac{1}{13}} = P^{0,2} ,I^{0,3} = \left( {\mathop \prod \limits_{i = 0}^{{N^{0,3} }} \left( {\frac{{p_{i}^{3} }}{{p_{i}^{0} }}} \right)^{{\frac{13}{{N^{0,3} }}}} } \right)^{\frac{1}{13}} = P^{0,3} , \ldots ,I^{0,12} = \left( {\mathop \prod \limits_{i = 0}^{{N^{0,12} }} \left( {\frac{{p_{i}^{12} }}{{p_{i}^{0} }}} \right)^{{\frac{13}{{N^{0,12} }}}} } \right)^{\frac{1}{13}} = P^{0,12}\), respectively.

Hence, a difference between the Jevons and the GEKS index is driven by the unmatched product items between periods 0 and t. In (4), these unmatched product items enter the compilation from the third term as \(\left[ {\frac{{p_{i}^{2} }}{{p_{i}^{0} }} \times \frac{{p_{i}^{1} }}{{p_{i}^{2} }}} \right], \left[ {\frac{{p_{i}^{3} }}{{p_{i}^{0} }} \times \frac{{p_{i}^{1} }}{{p_{i}^{3} }}} \right], \ldots ,\left[ {\frac{{p_{i}^{12} }}{{p_{i}^{0} }} \times \frac{{p_{i}^{1} }}{{p_{i}^{12} }}} \right]\) within each corresponding square bracket term.

The GEKS-Jevons index in (4), compared between periods 0 and 1, is also affected by average price changes between periods 0 and 2, 2 and 1, 0 and 3, 3 and 1, …, 0 and 12, 12 and 1, respectively. Clearly, the index would not be interpretable from an economic point of view.Footnote 4 However, in the production environment, the first published GEKS index is at t = T + 1, which only compares the current period with previous periods. Noting that the rolling window of 13-months is shifted by 1 month forward so t = 1, …, T, T + 1, the GEKS-Jevons index between the base period 1 and the first published period 13 can be expressed as:

$$ \begin{aligned} I^{1,13} = & \left( {\left[ {1 \times \mathop \prod \limits_{i = 0}^{{N^{1,13} }} \left( {\frac{{p_{i}^{13} }}{{p_{i}^{1} }}} \right)^{{\frac{1}{{N^{1,13} }}}} } \right] \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{1,2} }} \left( {\frac{{p_{i}^{2} }}{{p_{i}^{1} }}} \right)^{{\frac{1}{{N^{1,2} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{2,13} }} \left( {\frac{{p_{i}^{13} }}{{p_{i}^{2} }}} \right)^{{\frac{1}{{N^{2,13} }}}} } \right] \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{1,3} }} \left( {\frac{{p_{i}^{3} }}{{p_{i}^{1} }}} \right)^{{\frac{1}{{N^{1,3} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{3,13} }} \left( {\frac{{p_{i}^{13} }}{{p_{i}^{3} }}} \right)^{{\frac{1}{{N^{3,13} }}}} } \right]} \right. \\ & \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{1,4} }} \left( {\frac{{p_{i}^{4} }}{{p_{i}^{1} }}} \right)^{{\frac{1}{{N^{1,4} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{4,13} }} \left( {\frac{{p_{i}^{13} }}{{p_{i}^{4} }}} \right)^{{\frac{1}{{N^{4,13} }}}} } \right] \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{1,5} }} \left( {\frac{{p_{i}^{5} }}{{p_{i}^{1} }}} \right)^{{\frac{1}{{N^{1,5} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{5,13} }} \left( {\frac{{p_{i}^{13} }}{{p_{i}^{5} }}} \right)^{{\frac{1}{{N^{5,13} }}}} } \right] \ldots \\ & \left. { \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{1,13} }} \left( {\frac{{p_{i}^{13} }}{{p_{i}^{1} }}} \right)^{{\frac{1}{{N^{1,12} }}}} \times 1} \right]} \right)^{\frac{1}{13}} . \\ \end{aligned} $$
(5)

Similarly, in (5), a difference between the Jevons and the GEKS index is driven by the unmatched product items between periods 1 and 13, \(\left[\frac{{p}_{i}^{2}}{{p}_{i}^{1}}\times \frac{{p}_{i}^{13}}{{p}_{i}^{2}}\right], \left[\frac{{p}_{i}^{3}}{{p}_{i}^{1}}\times \frac{{p}_{i}^{13}}{{p}_{i}^{3}}\right],\dots ,\left[\frac{{p}_{i}^{12}}{{p}_{i}^{1}}\times \frac{{p}_{i}^{13}}{{p}_{i}^{12}}\right]\). Under the unmatched-item sample assumption, the GEKS index at t = 13 is affected by average price changes in previous periods. This implies that, from the practical perspective, NSIs would encounter challenges in the interpretation of period-on-period average price changes. For example, if the average price change between compared periods 1 and 13 (1 = January 2021 and 13 = January 2022) was mainly driven by the unmatched product items compared between periods 1 and 3 and 1 and 10, the statement would read: The year-on-year inflation in January 2022 of 2% was mainly driven by the average price increase of product items between January 2021 and March 2021 and January 2021 and October 2021.

It is evident that the above statement does not make any economic sense. Therefore, with the implementation of the GEKS method for official price statistics, NSIs require a fundamental amendment of their methodological framework, including the change of the interpretation of consumer price indices. A statement of the simple period-on-period average price changes might no longer be valid.

The other drawback of the GEKS method, and all other multilateral methods, is that the compilation of the new index at T + 1 causes a revision of previous indices in the window. This can be overcome by splicing the new index at T + 1 onto the indices compiled in the previous window, which can also be interpreted as redefining the base period back to 0 in the case of the rolling window approach. The use of splicing methods is discussed in Sect. 3.2.

3.1.2 Expanding window GEKS method

In the expanding window approach, the time window T is not fixed but expands, so t = T and the base period always remains at t = 0. It implies that for the first time period t = 1, the GEKS-Jevons index simplifies to the Jevons index.

For the second period t = 2, so T = 2 with a 3-months window, a compilation of the GEKS-Jevons index is expressed as follows:

$$ \begin{aligned} I^{0,2} = & \left( {\left[ {1 \times \mathop \prod \limits_{i = 0}^{{N^{0,2} }} \left( {\frac{{p_{i}^{2} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,2} }}}} } \right] \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{0,1} }} \left( {\frac{{p_{i}^{1} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,1} }}}} \times \mathop \prod \limits_{i = 0}^{{N^{1,2} }} \left( {\frac{{p_{i}^{2} }}{{p_{i}^{1} }}} \right)^{{\frac{1}{{N^{1,2} }}}} } \right]} \right. \\ & \left. { \times \left[ {\mathop \prod \limits_{i = 0}^{{N^{0,2} }} \left( {\frac{{p_{i}^{2} }}{{p_{i}^{0} }}} \right)^{{\frac{1}{{N^{0,2} }}}} \times 1} \right]} \right)^{\frac{1}{13}} . \\ \end{aligned} $$
(6)

And so on for t = 3, …, 12. The window can be reset at t = 12, or it can further be expanded until 25-months. Similar conclusions for the matched and unmatched product items can be drawn as in the rolling window approach. The advantage of the expanded window approach could be its interpretation since it only compares average price changes for periods in the same year, if it is reset at t = 12. In the empirical analysis in Sect. 4, we test the application of both approaches using web scraped prices.

Again, a problem of the revision occurs since, for example, in the case of t = 2, the index at t = 1 is recalculated. As mentioned previously, this can be overcome by the application of the splice extension. In the following section, we outline three splicing methods that can be adapted to both rolling and expanding window approaches.

3.2 Splicing methods

To overcome the problem of the revision of previously compiled multilateral indices, we recast two common splicing methods that can be adapted to both rolling and expanding window approaches. Moreover, we propose a new splicing method that uses a shrinkage estimator for creating a link between the new index and the previously compiled index series.

The movement splice uses only a previously calculated index for period T as the link period for the new T + 1 index. For both rolling and expanding window approaches, it can be defined as (Eurostat 2022):

$$ I^{0,T + 1} = I^{0,T} \times \left( {\frac{{I^{1,T + 1} }}{{I^{1,T} }}} \right). $$
(7)

The mean splice takes into account all previously compiled index series as link periods for the new T + 1 index. For the rolling window approach, the new index T + 1, using a 13-months window, is spliced as:

$$ I^{0,T + 1} = \left[ {\left( {I^{0,T - 11} \times \left( {\frac{{I^{1,T + 1} }}{{I^{1,T - 11} }}} \right)} \right) \times \left( {I^{0,T - 10} \times \left( {\frac{{I^{1,T + 1} }}{{I^{1,T - 10} }}} \right)} \right) \times \left( {I^{0,T - 9} \times \left( {\frac{{I^{1,T + 1} }}{{I^{1,T - 9} }}} \right)} \right) \cdots \times \left( {I^{0,T} \times \left( {\frac{{I^{1,T + 1} }}{{I^{1,T} }}} \right)} \right) } \right]^{{\frac{1}{T + 1}}} . $$
(8)

Equation (8) is a geometric average of indices compiled in the previous window that are weighted by the ratio of the new and previous indices compiled in the new window. Equation (8) can similarly be adopted to the case of the expanded window approach, where the number of terms in the round brackets increases with T. However, for the expanded window approach, this could create an overly smoothed index series, which is examined in the empirical study in Sect. 4.

We propose a new splicing method that is derived based on the Stein-type shrinkage estimator (Stein 1975). The Stein-type shrinkage estimator was originally developed to correct a sample covariance matrix bias for high-dimensional data by shrinking its largely dispersed eigenvalues, refer to Tong et al. (2014) for more details. The new splice is additive and it can be defined as:

$$ I^{0,T + 1} = I^{0,T} + \lambda^{1,T + 1} $$
(9)

where \(\lambda^{1,T + 1}\) is a shrinkage estimator. In the rolling window approach, we propose the following shrinkage estimator which is an adjusted version presented in the paper by Tong et al. (2014):

$$ \lambda^{1,T + 1} = \frac{{\sqrt {T + 1} }}{{\mathop \sum \nolimits_{t = 1}^{T} \frac{{I^{0,t} }}{{I^{1,T + 1} - I^{1,T + 1 - t} }}}}. $$
(10)

In the denominator of (10), the GEKS indices compiled in the previous window are divided by a difference between the new \(I^{1,T + 1}\) index and the previous indices \(I^{1,T + 1 - t}\) compiled in the new window which is then summed up. The nominator of (10) is kept fixed as the window length. It can also be adapted to the expanding window approach, where the number of terms in the denominator expands correspondingly with the expanding window length T.

The splicing method in (9) significantly differs from other two splicing methods since it is additive rather than multiplicative. It should be noted that this is a first time to apply a new type of the additive splicing method. It requires further theoretical and analytical research that is a scope of another research study.

Furthermore, it is a well-known fact that after the application of the splice, multilateral indices do not meet the criterion of transitivity \(I^{0,t} \ne \frac{1}{{I^{0,t} }}\). In the case of the bilateral index, this violation would normally invalidate the use of the index for official price statistics.

4 Results and discussions

We use web scraped data in the empirical study of the theoretical framework outlined in the previous sections. The daily web scraped prices of product categories washing machines, dryers and dishwashers (all in ECOICOP 5-digit: 05.3.1.2, refer to Eurostat, RAMON) are available from 01 December 2019 until 31 December 2021. The information was scraped from the website https://www.heureka.sk/, which is an online platform that offers various products on sale from different retailers. We collect multiple daily prices for each product item for a particular day.

Hence, before the compilation of price indices, an aggregation of daily prices on a monthly level is required. We use the geometric average for aggregation of product items’ prices, first on a daily and then on a monthly level. The aggregation by the geometric average reduces sometimes undesirable daily price volatility. For more details related to analysis of daily price volatility and the aggregation of daily prices refer to the paper by Knížat (2023) that uses the same set of web scarped data in its case study.

The following figures display the consumer price indices for rolling and expanding window approaches, respectively, of the above mentioned product categories.

Figure 1 shows the consumer price indices for the product category washing machines with a base period December 2020. Noting that the rolling window GEKS-Jevons index is rebased to December 2020Footnote 5 since its compilation requires a period of 13-months prior to the implementation date, which is also shown in Fig. 1 starting from December 2019. The fixed base Jevons and the chained Jevons index are calculated from December 2020 as a base period. It is evident that the chained Jevons index suffers from the downward chain drift when compared to the fixed base Jevons index. The GEKS-Jevons indices extended by the movement and mean splices are very similar, staying approximately in the middle between the fixed base Jevons and the chained Jevons index. The proposed additive splice method in the GEKS-Jevons index does not always follow the same trend as the other indices.

Fig. 1
figure 1

Source: Own construction

Rolling window consumer price indices—washing machines.

Similarly, Fig. 2 shows the consumer price indices for the product category washing machines but with an expanding window when compiling the multilateral GEKS-Jevons indices. The expansion of T is carried out from January 2020 until December 2021 over a 25-months window, with December 2019 a base period. The mean splicing extension method of the GEKS-Jevons index is not suitable since it creates an overly smoothed index series. The downward chain drift of the chained Jevons index starts from April 2021, it is not present prior to this date. This can indicate that a stability of the product items in monthly samples could play an important role in the drift, i.e., the addition of new items or the attrition of old items might exacerbate the drift. The GEKS-Jevons index extended by the proposed additive splice stays slightly above the other splicing methods. It follows the trend of other indices, except of two peaks in June 2020 and March 2021.

Fig. 2
figure 2

Source: Own construction

Expanding window consumer price indices—washing machines.

Overall, the expanding window GEKS-Jevons index with the movement splice extension seems to be closer to the fixed base Jevons index as compared to the rolling window GEKS-Jevons indices. This could indicate that it further reduces a downward drift which is a phenomenon when using chained indices.

Figure 3 shows the consumer price indices for the product category dryers, a base period and rebasing are defined as in Fig. 1. Similar conclusions as in Fig. 1 can be drawn, except that the rolling window GEKS-Jevons index extended by the additive splice suffers from a significant upward drift.

Fig. 3
figure 3

Source: Own construction

Rolling window consumer price indices—dryers.

Figure 4 shows the consumer price indices for the product category dryers with an expanding window when compiling the multilateral GEKS-Jevons indices, a base period and expansion window are defined as in Fig. 2. Again, the mean splice method is not suitable for the extension. Similar remarks for the proposed additive splicing method as in Fig. 2 can be drawn. It suffers from two “high-peak” periods which is due to a sensitivity to the difference between the current and some of the previous indices which is in the denominator of (10), i.e., a large difference might cause a jump in the spliced index.

Fig. 4
figure 4

Source: Own construction

Expanding window consumer price indices—dryers.

Figure 5 shows the consumer price indices for the product category dishwashers, a base period and rebasing are defined as in Fig. 1. Again, the chained Jevons index has a tendency to drift downward as compared to the fixed base Jevons index, albeit with a lower magnitude than in the previous product categories. The movement and mean splicing methods are close to the fixed base Jevons index, with the last three index series staying above it. The proposed additive splicing method shows irregular behaviour.

Fig. 5
figure 5

Source: Own construction

Rolling window consumer price indices—dishwashers.

Figure 6 shows the consumer price indices for the product category dryers with an expanding window when compiling the multilateral GEKS-Jevons indices, a base period and expansion window are defined as in Fig. 2. The proposed additive splicing method shows similar trend as other displayed indices and it mostly stays slightly above the fixed base Jevons index. Similar conclusions can be drawn as in the expanding window approach for two previous product categories.

Fig. 6
figure 6

Source: Own construction

Expanding window consumer price indices—dishwashers.

In general, we can conclude that the expanding window GEKS-Jevons index, extended by the movement splice, have a tendency to further reduce the downward chain drift when compared to the rolling window GEKS-Jevons index. An important observation is that the downward chain drift becomes more evident from the year 2020, which covers the economic period affected by the lockdown of physical outlets. In this period, an online shopping had been its substitute, which might have caused more volatile replacement rates in the monthly samples of product items.

The proposed additive splicing method shows some irregular behaviour, particularly, when used in the extension of the rolling window GEKS-Jevons index. It is mainly affected by large differences between the current and previous index series, a denominator of (10). Therefore, a further analytical study of the shrinkage estimator is required with possible adjustments that would correct the aforementioned sensitivity. However, it shows a potential to further reduce a downward chained drift since it has a tendency to be close to the fixed base Jevons index.

5 Conclusions

The main objective of this paper is to show an intricacy of multilateral methods in the compilation of consumer price indices when used for official price statistics. Particularly, we provide a comprehensive overview of the GEKS method that compiles bilateral Jevons indices calculated for each successive periods in the defined window. The paper shows that the published GEKS-Jevons index at the current period can be affected by previous bilateral Jevons index series in the given window. It follows that a challenge for NSIs is to publicly disseminate such consumer price indices in the context of the fundamental definition of period-on-period average price changes. The solution is an amendment of the methodological framework for official price statistics that would include changing the definition of the consumer price index. However, the interpretation of different multilateral indices varies and it is not always easy to understand by a non-expert.

In the empirical study, we show the application of the GEKS-Jevons index for both rolling and expanding window approaches using web scraped data for three different product categories. The expanding window approach seems to be more likely to further reduce a downward chain drift with comparison to the rolling window approach. The paper shows that the decision to choose a suitable price index could be different for each product category. It should be mainly driven by the replacement rate in the monthly samples of product items and the empirical testing of different price indices for the particular product category.

Moreover, a revision problem of already published index series when compiling a new index at T + 1 using multilateral methods brings additional computational and practical challenges. There are numerous options for splicing methods, which are used to correct the revision problem, but it is not always straightforward which one is the most appropriate. In the expanding window approach, the movement splice is preferable that is also easy to calculate and interpret since it only uses the latest published index in the extension. Moreover, we propose a new splice that is of the additive character, which might have a potential to further reduce a downward chain drift. However, since it is new in the context of the extension of multilateral price indices, it requires further theoretical and analytical research.

We conclude that new data sources for price statistics bring the opportunity to capture the price changes in the economy more accurately, and, therefore, a replacement of bilateral indices, in some cases, by multilateral methods for the calculation of consumer price indices is required. Again, we emphasise that a thorough data analysis of each product category is required to assess whether the application of multilateral methods is the most suitable.

The research in this paper can be extended to similarly evaluate the application of other multilateral methods in practice. The estimation methodology of the Geary–Khamis and Time-Product Dummy index is significantly different compared to the compilation technique of the GEKS method. Moreover, additional splicing methods that are adaptable for both rolling and expanding window approaches can be explored.