1 Introduction

Accurate near-term inflation forecasts are important to most central banks as they contribute towards a more precise assessment of the economic outlook and an appropriate policy stance. This variable would also usually have a significant influence over the evolution of the short-term interest rate, which would potentially affect the activity of a diverse set of economic agents. In addition, such forecasts anchor inflation expectations, which may improve policy efficacy and economic stability, to provide an improved foundation for higher levels of economic growth. For this reason, it is important that central banks continue to assess the relative performance of different forecasting approaches and the use of different information sets. More recently, such investigations have considered the use of statistical learning methodologies that utilise different types of big data to inform policy decisions.Footnote 1 In addition, studies that make use of these methodologies have also influenced those policy decisions that consider the impact of the COVID-19 pandemic on economic activity.Footnote 2

The relative accuracy of inflation forecasts has been considered in a number of important studies, where in an early investigation, Stock and Watson (2007) suggest that relatively simple univariate models provide good comparative forecasts of inflation in the USA, where a model that incorporates both unobserved components and stochastic volatility provides a reasonably good forecast of inflation.Footnote 3 Similarly, Faust and Wright (2013) also advocate for the use of a relatively straightforward approach and suggest that judgement forecasts, such as those from the Federal Reserve or inflation expectation surveys, tend to be more accurate than various forecasting model predictions. These findings, which largely relate to a period of stable economic activity for a developed economy, should not be too surprising as the conditional mean of inflation was highly persistent (Fuhrer 2010; Wolters and Tillmann 2015). However, over periods where economic activity is not particularly stable, a forecast that is close to the previously observed mean may not be terribly accurate and the models may need to allow for sustained departures from steady-state values. Hence, it may be necessary to make a number of amendments to traditional forecasting models during a period of economic crisis, or where the rate of inflation is relatively variable, as in the case of several low- and middle-income countries.

To address some of the challenges that may arise when looking to forecast macroeconomic variables, following a significant structural change or large departure from steady-state values, Galvao (2021) summarises a number of developments from the international literature, while Castle et al. (2021) and Coulombe et al. (2021) note that statistical learning models that are able to adapt to various changes may perform better than well-specified structural models.Footnote 4 In addition, the use of statistical learning models that incorporate nonparametric nonlinear features have gained significant attention over recent periods of time, partially due to the fact that they may be applied to large datasets to yield impressive results. For example, Medeiros et al. (2021) make use of nonlinear statistical learning models that are able to learn complex unknown functional forms, which may be useful when there are potential structural changes in both the mean and trend, to forecast inflation. Their results suggest that these techniques may provide superior forecasts over medium to longer horizons, when making use of the large macroeconomic dataset for the USA (the construction of which is described in McCracken and Ng (2016)).Footnote 5 Coulombe et al. (2022) confirm these results and suggest that the statistical learning models that are able to incorporate nonparametric nonlinear features are responsible for the most significant performance gains, when comparing the predictions of a large suit of different forecasting approaches.

In addition to the above, there are also a number of similar studies that consider the relative merits of employing statistical learning approaches that seek to summarise all the available information (which is contained in the set of potential predictors), as opposed to only selecting those variables from a set of potential predictors that provide useful predictive power (i.e. the density versus sparsity debate). Giannone et al. (2021) have suggested that when making use of various macroeconomic and financial data sets for the USA, the forecasts of dense models are more accurate than the sparse counterparts. Similar arguments are made in Coulombe et al. (2022), who note that the use of sparse techniques would usually contribute towards to a substantial decline in forecasting accuracy. However, the results that are contained in Joseph et al. (2021) note that when restricting the subset of potential predictors, which incorporate disaggregated consumption price indices for the UK, the sparse models provide more impressive results.Footnote 6

In this paper, we make use of four broad categories of models to predict future measures of inflation. The first of these relate to the benchmark models, which include traditional random walk, autoregressive and Bayesian vector autoregressive (BVAR) specifications. In addition, we also include the forecasts from the South African Reserve Bank (SARB) disaggregated inflation model (DIM), which is largely responsible for influencing the monthly near-term inflation forecasts, along with the actual monthly inflation forecasts that are presented to the monthly Monetary Policy Committee (MPC) meetings. The second group of models make use of dimensionality reduction techniques that seek to summarise all of the data from the potential predictors and would include those frameworks based upon principal component analysis. The third group of models make use of variable selection techniques and would include methods that make use of shrinkage estimators, penalised likelihood functions or Bayesian model selection techniques. And then finally, the fourth group of models include the use of nonlinear statistical learning forecasting models, such as the random forest and neural network, which may also incorporate nonparametric features.

These models are applied to data that is measured at different levels of aggregation for consumer prices, which is collected by Statistics South Africa (StatsSA) to construct the South African Consumer Price Index (CPI).Footnote 7 The raw dataset incorporates prices for 34,075 unique goods and services, which were collected by fieldworkers that are dispersed across the country. Through various methods of aggregation and with the assistance of StatsSA, we were then able to reconstruct the set of 216 disaggregated predictors for the period between January 2009 and March 2021. In addition, we also make use of the publically available dataset for CPI that is measured at a slightly higher level of aggregation (and includes data for 46 different items). Hence, these datasets also incorporate a period of 12 months, over which various lockdown measures were imposed. When measured at higher levels of disaggregation, it has been suggested that such a dataset contains information that relates to the idiosyncratic behaviour of consumer prices, where the frequency and dispersion of price adjustments can vary across items and over time (Chu et al. 2018; Petrella et al. 2019; Stock and Watson 2020; Chetty et al. 2020; Carvalho et al. 2020; Cavallo 2020). Given these characteristics of the data, we could conceive that when the price indices are subjected to various forms of aggregation, their predictive power may decline. For example, if the disaggregated price index for brown bread has impressive predictive power, while the other products in the category for breads and cereals are poor predictors, then the signal that is provided by brown bread may be obscured if we were to restrict the analysis to use the aggregate data for the category rather than the individual goods. Previous findings in Hubrich and Hendry (2005) suggest that the use of disaggregated CPI components for the USA does not result in a meaningful improvement in forecasting accuracy, while studies that were conducted for Mexico and Portugal suggest that the use of disaggregated components could provide notable improvements (Ibarra 2012; Duarte and Rua 2007).

Our results suggest that despite the limitations of the data, which largely pertain to the number of available observations, the combined use of big data and statistical learning methods provide results that are potentially able to compete with most benchmarks over medium to longer horizons. However, many of the traditional benchmarks are superior over shorter horizons. In addition, after making use of data that is measured at different levels of aggregation, we note that the use of more disaggregated data results in an improved forecasting performance over all horizons. The results also suggest that the forecasts of several sparse models are superior to those of dense models, when making use of more disaggregated data. For example, both the least absolute shrinkage and selection operator (LASSO) and the ridge regression provide results that are superior to the dynamic factor models, over most horizons, when using data for headline inflation. Hence, there would appear to be advantages to identifying those variables that contribute towards the underlying predictive signal in the data, by restricting the information that is used in the construction of the forecast to those variables that have substantive predictive power. Furthermore, we also note that the relative performance of the statistical learning methods is more impressive when the rate of inflation deviates from its steady state during the period that incorporated a number of economic lockdowns.Footnote 8

The remaining sections of this paper are organised as follows: Section 2 contains a review of the inflation forecasting models that have been applied to South Africa data, while Sect. 3 describes the methodology of the various models that have been specified in this study. Details relating to the data are discussed in Sect. 4 and the results from the different forecasting models are presented in Sect. 5. Then finally, Sect. 6 concludes.

2 Review of inflation forecasting in South Africa

A number of studies have considered the relative performance of inflation forecasting models in South Africa. These include those that emphasise the structural features of an economy, where in an early study, Woglom (2005) notes that inflation forecasts that are generated from a simple Phillips curve are not particularly accurate. However, when making use of a more expansive variant of a structural model, Smal et al. (2007) suggest that such models are capable of producing quarterly forecasts for CPIXFootnote 9 inflation that are more accurate than either the DIM or autoregressive integrated moving average model. In addition, these forecasts were also shown to be more accurate than the Reuters consensus forecast over their particular sample. Subsequent structural models, which include Liu et al. (2009), suggest that the forecasts from a small closed-economy New Keynesian dynamic stochastic general equilibrium (NKDSGE) model outperform those that are generated by classical vector autoregressive (VAR) and BVAR models for the South African GDP deflator. However, these authors also note that the difference in root-mean-squared error (RMSE) was not significant in most cases. Thereafter, Steinbach et al. (2009) extended the NKDSGE model to incorporate small open-economy features and found that the model’s forecasts for CPIX inflation provided a lower RMSE, when compared to the Reuters consensus forecast, over a horizon that extends between four- and seven-quarters ahead. Similarly, Alpanda et al. (2011) built upon the small open-economy NKDSGE model that is discussed in Alpanda et al. (2010) and Alpanda et al. (2010), to show that their model provides better forecasts for consumer price inflation over shorter horizons. Furthermore, they also show that the difference in performance relative to classical VAR, BVAR and random-walk models is significantly different from zero.

This literature has subsequently been extended to consider the performance of a small open-economy NKDSGE-VAR model in Gupta and Steinbach (2013), which generates CPIX inflation forecasts that are superior to classical VAR and most BVAR models (with the exception of a BVAR model that incorporates a stochastic search variable selection prior) over a one-quarter ahead horizon. Other researchers have considered the role of nonlinearities within structural models, where Balcilar et al. (2015) make use of a nonlinear NKDSGE model, which employs the second-order solution method of Schmitt-Grohé and Uribe (2004) and a particle filter to evaluate the likelihood function, to provide forecasts for consumer inflation that have a lower RMSE, when compared to a large variety of BVAR models (including those that employ variable selection priors). Furthermore they found that the difference in forecasting performance is usually statistically significant, when compared to a random-walk and linear NKDSGE model (particularly over longer horizons). However, when considering the use of regime-switching nonlinearities, Balcilar et al. (2017) note that the out-of-sample forecasts for South African inflation that are generated by different forms of Markov-switching NKDSGE models are largely inferior to the single regime counterpart.

There are also a number of papers that focus on the application of different non-structural statistical techniques to forecast South African inflation, to which this paper contributes. For example, in an attempt to reduce the potential effects of an omitted variable bias, Gupta and Kabundi (2011) make use of the Stock and Watson (2002b) and Forni et al. (2000) large factor models to forecast the percentage change in the implicit GDP deflator, along with the percentage change in real per capita GDP and the 91-day Treasury Bill rate in South Africa, over a one- to four-quarter ahead period from 2001Q1 to 2006Q4. They make use of 267 quarterly macroeconomic series to show that the factor models tend to outperform the unrestricted VAR, BVAR and small closed-economy NKDSGE models. Similar results are provided in Gupta and Kabundi (2010), where it is noted that large-scale data-rich models are better suited to forecasting key macroeconomic variables, relative to small-scale models. As an alternative, Kanda et al. (2016) is one of the few studies that make use of monthly data to focus on evaluating the performance of a suite of univariate nonlinear models, which include a locally linear model tree, neuro-fuzzy, multilayered perceptron, artificial neural network, nonlinear autoregressive, and genetic algorithm-based forecasting model. Their findings suggest that the locally linear model tree provides forecasts that can compete with the linear autoregressive model and is generally superior over longer horizons. In addition, Ruch et al. (2020) derive forecasts for quarterly measures of core inflation in South Africa with the aid of time-varying parameter vector autoregressive models (TVP-VARs), factor-augmented VARs, and structural break models to show that small TVP-VARs outperform all their other models, where additional information on the growth rate of the economy and the interest rate is sufficient to forecast core inflation accurately.

3 Methodology

To describe the methodology that has been employed by the various models, it is necessary to introduce some notation. In all that follows, we assume that \({\varvec{y}} = \{y_1, \ldots , y_n\}\) is a vector of data for the measure of inflation, where the observations that arise over time are denoted, \(i \in \{1, \ldots , n\}\). The matrix for the set of predictors that include the price indices for the different products or categories that are sampled to construct the CPI are contained in \({\varvec{X}} = \{x_{1,1}, \ldots , x_{n,p}\}^{\prime }\), which is of dimension \((n \times p)\), while \(j \in \{1, \ldots , p\}\) is used to denote each of the different predictors in the matrix. We made use of four lags for the predictors in each of the models. To consider the relative forecasting accuracy of the different models, we employ a recursive out-of-sample scheme that extends over a horizon of between one and twenty-four months ahead, where the data that is used to test the predictions extends over a four-year period.

The motivation for making use of a recursive forecasting scheme, as opposed to a rolling-window scheme, is that the number of available observations that have been measured over time is relatively small and the forecasts over more recent periods of time may have benefited from the use of the maximum available number of observations. For example, if we were to make use of a rolling-window scheme, then we would have been limited to making use of a constant in-sample period for the predictors of just over five years to generate a twenty-four month-ahead forecast, when using most of the statistical learning models. Since we have a large number of potential predictors, we have assumed that by making use of a slightly larger in-sample dataset, we could possibly generate more accurate forecasts for the observations that arise over more recent periods of time. Furthermore, as the structural change that is attributed to the pandemic arose relatively suddenly, and very close towards the end of the sample, there are probably few (if any) gains that could be made by making use of a rolling-window scheme for the forecasts over this period.

The statistics that are used to evaluate the out-of-sample performance of the respective models include the root-mean-squared error (RMSE), the mean absolute percentage error (MAPE), and the Diebold and Mariano (1995) statistics.Footnote 10 When reporting on the results, we consider the year-on-year forecasts of headline and core inflation.

3.1 Benchmarks models

To evaluate the relative forecasting performance of the statistical learning models, we consider the use of a number of benchmarks, which are provided by autoregressive, large-scale Bayesian vector autoregressive, stochastic volatility, and random-walk models. Additional benchmarks include the model that is currently used by the central bank in South Africa to generate short-term monthly inflation forecasts, and the actual forecasts that are presented to the MPC. The latter incorporate off-model information such as electricity tariff adjustments and within-month data releases. Where there are relatively few predictors, we also include the results from a linear regression model. Further details relating to the specification of the benchmark models are included in section A of the online appendix.

3.2 Dynamic factor models

To compare the results of competing models against various dense models, that make use of principal components to summarise linear combinations of the original predictors, we utilise two different variants of the dynamic factor model (DFM). The first of these builds upon the framework of the traditional DFM, which largely follows the seminal work of Forni et al. (2000), Stock and Watson (2002a, 2002b) and Bai (2003). In addition, it also makes use of the target factor approach that follows the work of Bai and Ng (2008), while the second approach utilises the three-pass regression filter of Kelly and Pruitt (2013, 2015). Section B of the online appendix contains additional details that pertain to the specification of these models, which seek to summarise the information that is contained in a large set of predictors, or explanatory variables.

3.3 Variable selection models

The literature on the development of statistical learning models that employ different variable selection methods, which are particularly useful when working with a large set of sparse predictors, is extensive. In this paper, we make use of a number of alternative methods that utilise a penalised likelihood function, where the parameters are estimated with frequentist techniques, as well as some of the Bayesian model selection counterparts. In particular, we employ the least absolute shrinkage and selection operator (LASSO), which was initially proposed by Tibshirani (1996), where the size of the penalty is determined by cross-validation. Furthermore, we also make use of the adaptive LASSO of Zou (2006), which may reduce the potential over-selection problem that has been encountered with the traditional LASSO. As an alternative to making use of the adaptive LASSO, we also make use of post-selection inference, to exclude those predictors that may not be able to make a significant contribution towards the explanation of future inflationary pressure. This exercise involves the application of methods that are discussed in Lee et al. (2016). The econometrics literature also makes extensive reference to the Post-LASSO estimation methods that are discussed in Belloni et al. (2011, 2013, 2014, 2017), which, in this particular setting, would motivate for the use of the methods in Belloni et al. (2013), to reduce the set of predictors to those that may be relevant.

As an alternative to imposing \(L_1\) penalties, we also make use of methods that seek to implement \(L_0\) penalties, which in general have improved properties, but require the use of methods that are not as efficient from a computational perspective. To implement these models, we follow the work of Rossell (2021). In addition, we also make use of models that impose \(L_2\) penalties, such as the case of ridge regressions that were first discussed in Hoerl and Kennard (1970a, 1970b), which seek to adjust the coefficient estimates to values of zero when they are deemed to be insignificantly different from zero. Models that make use of combinations of both \(L_1\) and \(L_2\) penalties are also implemented, which include the elastic net and smoothly clipped absolute deviation model of Fan and Li (2001).

As an alternative to making use of frequentist techniques, we also employ Bayesian model selection methods that consider the use of models that contain different sets of regressors, following Johnson and Rossell (2010, 2012) and Rossell and Telesca (2017). The results for the single specification that is most likely to contain the most useful predictors are reported along with the specifications that are summarised with Bayesian model averaging techniques. Additional details relating to the use of each of the variable selection methods have been included in section C of the online appendix.

3.4 Nonlinear statistical learning models

The suite of nonlinear statistical learning models, which may also contain nonparametric features, include ensemble methods, random forests, gradient boosting and neural networks. Further details relating to each of these methods are contained in section D of the online appendix.

3.4.1 Ensemble methods

For comparative purposes, we have also made use of an ensemble method, which takes the form of the complete subset regression (CSR) framework of Elliott et al. (2013, 2015). This procedure makes use of the results from independent models that are then combined with a deterministic calculation. In many respects, it is similar to the bagging procedure of Breiman (1996) and provides an intuitive method for generating forecasts from many variables. To apply this methodology, we fit a linear regression model that seeks to explain \(y_{i}\) using each of the individual regressors in \(x_{i-h}\). To identify the best predictors, we would then rank the absolute value of the t-statistics from the initial coefficient estimates. These predictors are then used to generate a number of individual forecasts, which are combined to provide the CSR forecast.

3.4.2 Random forests

The random forest model of Breiman (2001) reduces the variance of regression trees, which are nonparametric models that approximate an unknown nonlinear function with local predictions using recursive partitioning of the parameter space. They are based on bootstrap aggregation (bagging) methods for randomly constructed regression trees that take the form of nonparametric models that approximate an unknown nonlinear function with local predictions, using recursive partitioning of the parameter space that pertains to the covariates. Hence, to implement these methods, the parameter space is split successively to minimise the sum of squared errors in the regression.

3.4.3 Gradient boosting

As an alternative to random forests, gradient boosting seeks to build a model by repeatedly fitting a regression tree to the residuals. After each tree has grown to model the residuals, it is shrunk down by a factor before it is added to the current model. This would allow us to explain certain elements (including nonlinear relationships) that may have been discarded in the residual. A general gradient descent boosting paradigm has been developed for additive expansions based on any fitting criterion. It utilises developments discussed in Friedman et al. (2000) and Friedman (2001), where special enhancements are derived for the particular case where the individual additive components are regression trees. In general, it has been suggested that gradient boosting of regression trees produces competitive, highly robust results.

3.4.4 Deep learning (neural networks)

Neural network models usually take the form of highly parameterised nonparametric specifications that are able to potentially explain any nonlinear function. These models would often make use of a large number of parameter weights that transform the data that is contained in the set of predictors to fit the target variable. These parameter weights are learnt through repeated exposure to different subsets of the data. Deep learning methods make use of layered representations of neural network models that are stacked on top of each other to provide a mathematical framework for learning the rules that would allow for the mapping of the characteristics of the predictors to the target variable. Such models could potentially explain behaviour that is extremely complex, although there is also a significant possibility that the model may be prone to over-fitting errors. In our case, we have utilised a relatively simple model structure that will hopefully circumvent such concerns, where we have incorporated three hidden layers and a relatively parsimonious combination of 32, 16, and 8 nodes.

4 Data

The South African Consumer Price Index (CPI) measures changes in the general level of prices of consumer goods and services. It is a fixed-basket price index, in that it represents the cost of purchasing a fixed basket of consumer goods and services of constant quality and similar characteristics (Statistics South Africa 2017a). The items that are included in the basket seek to represent average household expenditure, using information from the Income and Expenditure Survey (IES) and more recently from the Living Conditions Survey (LCS), which was last conducted in 2014/15.Footnote 11 Note that the index only incorporates data on those products that contribute at least 0.1% of total household expenditure. Additional data sources such as regulatory reports, excise tax receipts, industry association reports and summarised transaction data from retailers are then used to align the data from the respective surveys with the data that goes into the household final consumption expenditure in the national accounts. The last update to the items that are included in the CPI basket was in January 2017, and the next update is expected to take place during 2021 (Statistics South Africa 2017b).

Since 2006, StatsSA has made use of fieldworkers who are responsible for collecting the relevant prices from the retail outlets directly. Each province has its own basket and every product that appears in at least one provincial basket is included in the national basket. The current CPI contains 412 products, which is slightly more than the previous basket, which included 393 products (Statistics South Africa 2017b), and its composition follows the United Nations Statistical Division (UNSD) standard for classifying household expenditure on goods and services. This standard is termed the Classification of Individual Consumption by Purpose (COICOP) and it currently incorporates 14 high-level (or 2-digit) categories (e.g. 01-Food and non-alcoholic beverages). Table 1, which is taken from Statistics South Africa (2017a), shows how the naming convention of the COICOP has been applied to the different levels of products and categories in South Africa.

Table 1 Convention for COICOP classification

In the subsequent analysis, we make use of the monthly four-digit data on consumer prices between January 2008 to March 2021, since a slightly different methodology was used to collect and classify the data for prior periods of time.Footnote 12 This dataset includes a total of 46 different predictors. In addition, we have also made use of a new dataset that contains more disaggregated data on the prices of goods in the consumption basket. This dataset includes information on 216 products or categories, where food products are measured at the 8-digit level and all other goods and services are measured at the 5-digit level. Unfortunately, the first observation in this dataset relates to January 2017, which would make for an extremely small in-sample training period in our case. Therefore, with the help of StatsSA we have now extended this dataset, going back to January 2009, by making use of the fieldworker data, which has been collected for 34,075 different products, across 5,505 outlets, that arise in 85 different areas.

To obtain a measure for the changes in prices over time, we calculate the price relative indices for the available fieldworker data, utilising the method that is used in the compilation of the respective CPI indices. This procedure involves the construction of a Jevon’s index, which is defined as the unweighted geometric mean of the price ratios that utilise data for the current and previous survey periods for a particular commodity (i.e. at the 8-digit level). Such a Jevon’s index may be constructed as follows:

$$\begin{aligned} \mathbb {I}^{J}_{i} = \prod ^{n}_{i=1} \left( \frac{P_{\theta ,i}}{P_{\theta ,i-1}}\right) ^{1/\xi } \end{aligned}$$
(1)

where \(\mathbb {I}^{J}_{t}\) denotes the Jevon’s index, while \(P_{\theta ,t}\) is the price of commodity \(\theta \) in period i, and \(\xi \) refers to the total number of items that are included in this calculation. In this study, we calculate a number of different variants of price relative indices to obtain information about price movements. This is then used to construct individual indices for each of the components at different levels of aggregation. After completing this process, we are left with 216 predictors for the eight/five digit data, over the sample period from January 2009 to March 2021.

Figure 1 displays the measures of headline inflation and core inflation over the entire sample, where the shaded area relates to the entire out-of-sample period, where we assume that we do not have future information relating to the outcome variable and predictors, when estimating the parameters in the different models. The initial observation in the out-of-sample period is April 2017. Note that the trend in both measures of inflation has declined over the out-of-sample period, which would suggest that most mean reverting models will produce a negative forecast bias. In addition, as would be expected, headline inflation is certainly much more volatile than core inflation, where over the out-of-sample period, core inflation has a variance of 0.61%, while the variance of headline inflation is 1.04%.

Fig. 1
figure 1

Inflation over initial in-sample and out-of-sample period (year-on-year)

5 Results

To evaluate the relative performance of the different models, we make use of a recursive out-of-sample forecasting exercise and a forecasting horizon of between one and twenty-four months ahead. The statistics that are used to evaluate the out-of-sample performance of the respective models include the root-mean-squared error (RMSE), the mean absolute percentage error (MAPE) and the Diebold and Mariano (1995) statistics.Footnote 13 When reporting on the results, we compare the year-on-year inflation forecasts against the official CPI release for headline and core inflation.Footnote 14 To generate forecasts, we mostly make use of the direct forecasting approach, where the only exceptions pertain to the random walk, DIM, autoregressive, and vector autoregressive models.Footnote 15 To apply the direct forecast approach over the forecasting horizon of \(h=\{1,\ldots , 24\}\), we estimate the model \({y}_{i}= \sum ^{p}_{j=1} {x}_{i-h,j} \beta _j + \epsilon _i\) and use the coefficients to find \(\mathbb {E}_i \left[ y_{i+h} | {x}_{i,j}\right] = \sum ^{p}_{j=1} {x}_{i,j} \hat{\beta }_j\), where the predictors may include lagged values of the target variable.

Table 2 summarises the main results, where we compare the RMSE for the official SARB forecasts, which were reported to the MPC, to the best-performing statistical learning model. We note that over shorter horizons, the official SARB forecasts, which have benefited from the use of off-model information and within-month data updates, are generally superior. However, over longer horizons, the nonlinear statistical learning models provide more impressive results. When considering the results for headline inflation, the boosting model that is applied to the four-digit data provides the most attractive results, when the horizon is twelve months or greater. Similarly, when applied to the eight-/five-digit data, the neural network model also appears to be responsible for lower RMSE statistics over longer horizons, when compared to the official SARB forecasts. However, they are not superior to the results of the boosting model that is applied to four-digit data.

Table 2 Root-mean-squared error

For core inflation, the results are similar, as the official SARB forecasts are superior over a horizon of between one and three months. However, from four-steps-ahead to longer horizons, the neural network model is able to generate a lower RMSE, when applied to eight/five-digit data. Furthermore, the random forest model, which is applied to four-digit data, is also responsible for a smaller forecasting error (compared to the other predictions in the table), when the horizon is greater than a year and less than two years ahead.

In addition to these results, we also report on the Shapley values for a selected statistical learning model that appears to provide attractive out-of-sample results, to identify the important drivers of future inflationary pressure. This work follows Lundberg and Lee (2017) and Joseph et al. (2020), and the results are contained in section H of the online appendix.

5.1 Four-digit data: headline inflation

Headline inflation is made up of forty-six different price indices that are measured at the four-digit level of aggregation. These indices are used to generate the DIM forecast, which has a significant influence over the official central bank near-term monthly inflation forecast. Given the relatively small number of predictors, there are sufficient degrees-of-freedom to be able to include the forecasts from a linear regression model in this case.

Table 3 contains the out-of-sample RMSE statistics. When comparing the relative performance of all the benchmark models, we note that with the exception of the linear regression model, the errors are all fairly similar, where the DIM and official SARB forecasts are superior over the short term, while the random walk and stochastic volatility models are superior over longer horizons. Note also that over the first three months, the official SARB forecasts provide a RMSE that is about half the size of the DIM, which suggests that the use of off-model information has reduced the forecasting error by a relatively large amount over these horizons.

Turning our attention to the relative forecasting performance of the linear regression model, we note that it provides results that are indicative of a model that is prone to the over-fitting problem, since the models that make use of variable selection techniques often provide more impressive results. This would also suggest that the matrix that contains the predictors may be sparse, although we do not make use of a specific definition for statistical sparsity, as in McCullagh and Polson (2018). Further support for this finding is included in section E of the online appendix, where the in-sample estimation results for the full sample suggest that a model that makes use of twelve explanatory variables is able to provide a near-perfect explanation of the behaviour that is measured by headline inflation.

Table 3 Root-mean-squared error (four-digit headline inflation)

When we compare the accuracy of the forecasts from the dense models relative to the sparse models, we note that the results are somewhat mixed, where although there are a number of cases where the variable selection techniques provide more impressive results, the DFMs are at least competitive in all cases and superior over longer horizons. In the case of the twenty-four-month-ahead forecast, this would imply that the observed values of the predictors from twenty-four months ago, which provide the best explanation of current headline inflation, are not necessarily going to be the same, as the ones that provide the best twenty-four-month-ahead forecast, from the current point in time. Furthermore, we also note that in this case, the variable selection techniques would in most cases appear to be inferior to the benchmarks, which include the autoregressive, stochastic volatility and random walk models. Note also that the results for the nonlinear statistical learning models are in most cases similar to those of the DFMs models, however, over horizons that are longer than six months, the model that makes use of boosting methods provides forecasts that are more accurate than both dense and sparse models.

Table 4 Diebold–Mariano statistics (four-digit head)

Table 4 contains the Diebold and Mariano (1995) statistics for the forecasts of the different models, relative to what is produced by the random-walk model. In this case, we note that the only forecasts that are significantly superior to the random-walk forecast are provided by the DIM over a one-month horizon and by the official SARB forecast over a one- and two-month horizon. Furthermore, the forecasting performance of the DFMs, relative to the random-walk forecast, is not significantly different from zero over all horizons, while the random-walk forecast provides a significant improvement in forecasting performance over several horizons, relative to most of the models that employ variable selection techniques.

5.2 Eight/five-digit data: headline inflation

When using data that is measured at the eight-digit level of aggregation for food items and at the five-digit level for most other goods, we have a total of two-hundred-and-ten different price indices for headline inflation. Given the relatively large number of predictors, we do not have sufficient degrees of freedom to estimate a linear regression model. Table 5 contains the out-of-sample RMSEs for the different models, where most of the results that pertain to the benchmarks are similar to what was provided when using four-digit data, with the exception of the BVAR, which has experienced a slight deterioration in performance.

Note that the RMSEs for sparse models are lower when using the more disaggregated data, which would suggest that the combined use of more disaggregated data and variable selection techniques allows for an improved forecasting performance, as it would discard some of the noise that may be included in the variables when they are subject to greater degrees of aggregation. This is in contrast to the results of the dense models, which provide forecasts that are slightly more inaccurate than those that were derived from the four-digit data. And then finally, the results for the nonlinear statistical learning models are in most cases comparable to those that make use of the four-digit data.

Table 5 Root-mean-squared error (8/5-digit headline)
Table 6 Diebold–Mariano statistics (8/5-digit head)

Table 6 contains the Diebold and Mariano (1995) statistics, which are measured relative to the forecasts from the random-walk model. Once again, the only forecasts that are significantly superior to the random walk are provide by the DIM over a one-month horizon and by the official SARB forecast over a one- and two-month horizon. Furthermore, the forecasting performance of the LASSO at the three-month horizon is significantly more accurate than what is provided by the random-walk model, while most of the other variable selection forecasts are either positive (which is due to their lower RMSE) or not significantly different from zero.

5.3 Four-digit data: core inflation

In what follows, we repeat the above analysis, but in this case, the target variable is core inflation, which is derived from the measure of CPI that excludes the effects of changes in the prices of food, non-alcoholic beverages, fuel and energy. When using the four-digit data, we are able to make use of thirty-three different predictors for core inflation. After applying this data to the respective models, we calculate the RMSEs, which are displayed in Table 7. Note that in this case, the SARB forecasts are superior over a one- and two-month horizon, while the random-walk model provides superior forecasts for between three and six months ahead. Thereafter, the lowest RMSEs are provided by the ridge and random forest models. Once again, some of the worst results are provided by the linear regression model and after generating the in-sample summary statistics for the models that make use of variable selection techniques, we observe that the matrix that contains the predictors displays sparse characteristics.

Table 7 Root-mean-squared error (four-digit core)
Table 8 Diebold–Mariano statistics (four-digit core)

Table 8 contains the Diebold and Mariano (1995) statistics, which suggest that there is no occasion where the difference in forecasting performance, relative to the random-walk, is significantly different from zero, in favour of the competing model (even in the case of the short-term SARB forecasts). In addition, there are also a number of occasions where the random-walk model provides results that are significantly more accurate than any of the competing models.

5.4 Eight/five-digit data: core inflation

After excluding those items that are not included in the definition of core inflation, we are left with eighty-three price indices, which are measured at a five-digit level, since this measure does not include any food items. Table 9 contains the out-of-sample RMSEs for the different models, where we note that the results are fairly similar to the case where we made use of less disaggregated data. In this case, there is only one occasion where the random-walk model does not generate the lowest RMSE over the medium- to long-term horizon.

Table 9 Root-mean-squared error (8/5-digit core)

The Diebold and Mariano (1995) statistics that are contained in Table 10 suggest that there is no occasion where there is a significant difference in forecasting performance, in favour of the models that are competing with the random-walk.

Table 10 Diebold–Mariano statistics (8/5-digit core)

5.5 Change in the inflationary level or trend

Following the onset of the COVID-19 pandemic, South Africa initially went into lockdown on 27 March 2020. The use of these regulations resulted in what could be described as a level-shift in the rate of inflation, where between April 2019 and March 2020, year-on-year headline inflation averaged 4.2%, while between April 2020 and March 2021, it only averaged 2.9% (which is below the lower bound of the inflation target). In what follows, we discuss the relative performance of the different models following this change in the data-generating process, given the limitation that we only have twelve observations that arise after the onset of the pandemic.

In what is similar to Table 2, we compare the RMSE results for the official SARB forecasts to the best-performing dynamic factor, variable selection and nonlinear statistical learning models for the out-of-sample period that extends between April 2020 and March 2021. The full results for all the models over this out-of-sample period have been included in section G of the online appendix. Note that for the twelve-step-ahead forecast, we are only able to calculate the RMSE for a single realisation, and as such it is difficult to read too much into this result, while the RMSE for the one-step ahead forecast is computed over the average of twelve successive forecasts.Footnote 16

Table 11 Root-mean-squared error

These results suggest that if we were to impose a limit on the forecasting horizon at eight-steps ahead (or where we have at least 5 successive forecasts to evaluate), then there is always a statistical learning model that provides a lower RMSE, relative to the official SARB forecasts that may benefit from the use of off-model and within-month information. In addition, we also note that in general, for headline inflation, the variable selection models perform reasonably well, which may suggest that the removal of those variables that are unable to make a significant contribution towards the predictive ability of the model provide more accurate forecasts (where one would presume that the variables that are removed are unable to contribute towards the explanation of the level shift). Similarly, for core inflation, where the number of available predictors is somewhat limited, combining all the available information within a DFM would in most cases provide the most desirable forecast.

Fig. 2
figure 2

Forecasts from benchmark and statistical learning models—headline inflation

Figure 2 contains the results of the recursive one-step ahead forecasts that were generated for headline and core inflation, by the LASSO and DFM (with target factors), between April 2020 and March 2021. In both cases, the statistical learning models appear to have done a reasonable job of detecting the relative change in the inflationary level or trend.Footnote 17 While these results are of interest, particularly to those who are concerned with the relative performance of various models over the pandemic, one should be cautious of reading too much into them as they have been generated from a very small sample.

6 Conclusion

We assess the potential predictive power of a number of different forecasting models that may be applied to large datasets that are used to measure inflation. We find that the models that employ variable selection techniques and nonlinear statistical learning techniques provide impressive results, despite the fact that the number of observations in the dataset is limited. We also note that when comparing the use of models that seek to exploit any potential sparsity in the set of predictors, relative to those that seek to summarise all of the available information, the results are somewhat mixed, over the entire out-of-sample period. Over horizons that are longer than three months, the statistical learning models would also appear to provide results that are even more accurate than the sparse models, where the neural network and boosting models provide the most accurate results. However, the results of simple forecasting models continue to produce results that are in many cases superior to those of the statistical learning models. Hence, one would conclude that from a practical perspective, the use of statistical learning models in this particular setting may not provide forecasts that are consistently superior to what is provided by a simple random-walk model, although they are certainly competitive.

Furthermore, the results suggest that for headline inflation the official central bank forecast that is presented to the MPC, which incorporates various sources of off-model and within-month information, is more accurate than any of the other models, over the first three months. Similarly, over a one-month horizon the central bank forecast for core inflation is more accurate than any of the other models. Hence, the use of judgement has systematically improved the SARB forecasts over a short-term horizon. Another important finding relates to the use of more disaggregated data, where the results from the eight/five-digit data are generally more accurate than when we report on the use of the four-digit data, which suggests that the use of more disaggregated data provides more desirable results. In particular, those models that are able to distinguish between information that may or may not be of potential use are able to provide more accurate forecasts when they are applied to more disaggregated data. As has been shown, we can also use the output from the models to generate Shapley values, which provide policymakers with information that pertains to the drivers of future inflationary pressure. In addition, when we consider the relative performance of the benchmark models, which include a number of mean-reverting specifications, for the period that includes the effects of economic lockdowns over the pandemic, we note that the statistical learning models are able to detect the decrease in the trend of the respective measures of inflation reasonably quickly, to provide short-term forecasts that are more accurate than what was provided to the MPC.

Subsequent research into the use of alternative sources of big data, as well as the potential use of alternative statistical learning model specifications, may provide more promising forecasting results in the future. As has been noted, the number of available observations over time for this dataset is relatively limited, and as it is generally acknowledged that to provide impressive results in such a setting, statistical learning models, and in particular the nonlinear variants of these models, would usually require a relatively large number of observations that have been measured over time. Nevertheless, the fact that the forecasts from many of these models are competitive, despite the limitation of the data, may provide encouraging signs for researchers in this field of study.