Improved Breitung and Roling estimator for mixed-frequency models with application to forecasting inflation rates

Omer, Talha; Månsson, Kristofer; Sjölander, Pär; Kibria, B. M. Golam

doi:10.1007/s00362-023-01520-2

Improved Breitung and Roling estimator for mixed-frequency models with application to forecasting inflation rates

Regular Article
Open access
Published: 04 January 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Statistical Papers Aims and scope Submit manuscript

Improved Breitung and Roling estimator for mixed-frequency models with application to forecasting inflation rates

Download PDF

Talha Omer¹,
Kristofer Månsson¹,
Pär Sjölander¹ &
…
B. M. Golam Kibria²

561 Accesses
Explore all metrics

Abstract

Instead of applying the commonly used parametric Almon or Beta lag distribution of MIDAS, Breitung and Roling (J Forecast 34:588–603, 2015) suggested a nonparametric smoothed least-squares shrinkage estimator (henceforth ${SLS}_{1}$) for estimating mixed-frequency models. This ${SLS}_{1}$ approach ensures a flexible smooth trending lag distribution. However, even if the biasing parameter in ${SLS}_{1}$ solves the overparameterization problem, the cost is a decreased goodness-of-fit. Therefore, we suggest a modification of this shrinkage regression into a two-parameter smoothed least-squares estimator (${SLS}_{2}$). This estimator solves the overparameterization problem, and it has superior properties since it ensures that the orthogonality assumption between residuals and the predicted dependent variable holds, which leads to an increased goodness-of-fit. Our theoretical comparisons, supported by simulations, demonstrate that the increase in goodness-of-fit of the proposed two-parameter estimator also leads to a decrease in the mean square error of ${SLS}_{2},$ compared to that of ${SLS}_{1}$. Empirical results, where the inflation rate is forecasted based on the oil returns, demonstrate that our proposed ${SLS}_{2}$ estimator for mixed-frequency models provides better estimates in terms of decreased MSE and improved R², which in turn leads to better forecasts.

Macroeconomic and credit forecasts during the Greek crisis using Bayesian VARs

Article 23 August 2016

Semi-parametric estimation and forecasting for exogenous log-GARCH models

Article 05 April 2015

Re-examining the impact of oil prices on stock returns in the presence of time-varying volatility

Article 23 July 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In a time-series regression model, economic variables are often available at different time frequencies. However, conventional regression analysis is constrained to using variables that are accessible at the same frequency, typically in an aggregated form. This limitation disregards valuable information from higher frequency variables solely due to technical constraints. The significant scientific drawback of disregarding high-frequency information in the data by always aggregating to a lower mutual frequency is widely acknowledged. However, it wasn't until Ghysels et al. (2004, 2005, 2006) published their influential series of papers that these models gained substantial traction within the research community. Ghysels et al. (2004) introduced the term “mixed-data sampling” (MIDAS) to describe this filtering method, which enables the incorporation of multiple frequencies in prediction models, particularly when the variable with the lowest frequency is included on the left-hand side of the autoregressive distributed lag (ADL) regression equation. The general idea of a MIDAS regression is that it allows for prediction of a low frequency regressand as a function of regressors sampled at higher and/or the same frequencies. We may incorporate more than one high-frequency regressor and other regressors that are measured at the low frequency of the regressand, alongside lagged values of the regressand. Thus, the standard MIDAS regression is a general ADL model with a mix of low- and high-frequency variables to forecast the low-frequency regressand. Clearly, if the regressand is measured at a quarterly frequency, regressors based on monthly or daily frequencies consist of more updated and relevant information than the corresponding quarterly sampled regressors or monthly data aggregated to quarterly frequencies. The relevance of MIDAS-type regression is obvious since many macro variables, such as GDP, unemployment, and inflation, are not always updated at high frequencies, while energy data, such as oil prices, are updated frequently. Furthermore, some energy data, such as carbon dioxide emissions, are updated at a lower frequency than, for example, many macro indicators. Therefore, it is intuitive that additional high-frequency information will enhance the predictive performance of most forecasting models in energy economics. This advantage can also be seen in recent research by, for example, Valadkhani and Smyth (2017), Pan et al. (2018), Zhang and Wang (2019), and Wang et al. (2020), where energy data is used to forecast macro variables, or vice versa, depending on the frequency at which different variables are collected.

Traditional MIDAS regression models are usually based on various forms of ADL models, but there is comparatively rapid development in this vibrant emerging research field. Ghysels and others are inspired by Almon (1965), who utilized the Weierstrass approximation theorem, which states that every continuous function defined on a closed interval [a, b] can be uniformly, and closely approximated by a polynomial function of finite order. This is known as the Almon polynomial distributed lag (PDL) weighting, which imposes restrictions on lag coefficients in autoregressive models that can be used for mixed frequency weighting. However, the theorem does not state which order of the polynomial to use, which results in a model-selection problem and, in turn, potential misspecification issues. The advantage is, however, that the polynomial will result in fewer estimated parameters, usually a polynomial of order 2, 3, or possibly 4. Thus, the number of estimated coefficients depends on the polynomial order and not on the number of high-frequency lags. Therefore, when utilizing the parsimonious Almon estimator, fewer parameters need to be estimated. Since the seminal paper by Ghysels et al. (2004), there has been a substantial and growing literature on various MIDAS regression approaches considering different types of conditions. See, among others, Andreou et al. (2010, 2011, 2013), Clements and Galvão (2008, 2009), Ghysels et al. (2005, 2006, 2007), Ghysels and Wright (2009), Monteforte and Moretti, (2013), Kvedaras and Račkauskas (2010), Modugno, (2013), Penev et al. (2014) and Pan et al. (2018). When the frequency mismatch is small Foroni et al. (2015) presented the methodology of the unrestricted MIDAS (U-MIDAS) which is based on simple OLS.

To decrease the number of estimated coefficients, these methods are generally based on different parametric estimators, which require the parameters to follow a certain pattern, such as that produced by Almon lags. Breitung and Roling (2015) argue that the requirement for the parameter to follow a certain pattern is a major shortcoming of these parametric MIDAS methods. A common empirical situation when this does not hold is when the short-term effect of the high-frequency variable is larger than the long-term effect, which leads to a mixture of positive and negative coefficients. Therefore, Breitung and Roling (2015) suggest a new nonparametric MIDAS estimator (${SLS}_{1}$) for which it is not necessary to impose a certain pattern to reduce the number of estimated coefficients. This shrinkage method of Breitung and Roling (2015) has a ridge interpretation and outperforms the parametric MIDAS, especially in the presence of sign-changing parameters.

Compared to ordinary least squares (OLS) estimators, a negative property of shrinkage estimators such as ${SLS}_{1}$ is that they decrease the goodness-of-fit, measured as the coefficient of multiple determination of the regression model. Our suggested approach based on Lipovetsky and Conklin (2005) increases the out-of-sample properties by increasing the in-sample properties.

The purpose of this paper is to improve the Breitung and Roling (2015) estimator, ${SLS}_{1}$, by introducing a two-parameter estimator, ${SLS}_{2}$, based on the work by Lipovetsky and Conklin (2005). This estimator improves the goodness-of-fit of the regression model (see, Toker 2020). Furthermore, the increase in the goodness-of-fit of the regression model by introducing the second parameter also decreases the estimation error since the mean squared error (MSE) of the estimated parameter vector decreases. This final point is important in forecasting since the error can be decomposed into an estimation error and a forecasting error (see Breitung and Roling 2015). Therefore, our new estimator has the potential to improve the forecasts by decreasing the estimation error. Hence, our new approach improves the out-of-sample performance by increasing the goodness-of-fit. We state the conditions under which our newly proposed ${SLS}_{2}$ estimator is superior to the ${SLS}_{1}$ estimator in terms of the MSE. Furthermore, we investigate the small sample properties using a simulation study. Finally, by forecasting the inflation rate based on crude oil returns, we empirically demonstrate that this decreased estimation error can lead to improved out-of-sample forecasts. In this example, we forecast inflation rates using a simple AR model, MIDAS using Almon lags, the ${SLS}_{1}$ and the proposed ${SLS}_{2}$ estimators, and we show that our new proposed estimator minimizes the forecasting error.

2 Statistical methodology

2.1 Estimation procedure and the proposed estimator for mixed frequency models

Considered the regression model

$${y}_{t+h}={\alpha }_{0}+\sum_{j=0}^{p}{\beta }_{j}{x}_{t,j}+{\varepsilon }_{t+h}$$

(1)

which combines a low-frequency variable ${y}_{t}$ and a high-frequency variable denoted ${x}_{t,j}$. The time index $t=1,\dots ,T$ is for the dependent variable measured at a lower frequency, while $j$ represents the intraperiod frequencies of the independent variable. To simplify, we use the same notation as Breitung and Roling (2015). For convenience, we assume that the $j$ index runs in a reverse direction for instance, the index $t$ paired with ${n}_{t}$ shows the first observation, and pair $t$ and 0 represents the final observation. Furthermore, we consider only one independent variable, which enables us to write the MIDAS regression as the linear regression model shown in Eq. (1). The error term is denoted ${\varepsilon }_{t+h}$, and it is assumed to follow the Gauss–Markov assumptions. The lag length $p$ is assumed to be smaller than the minimum number of intraperiod independent variable observations. It is possible to rewrite Eq. (1) using the following matrix notation for simplicity, and we ignore the intercept term ${\alpha }_{0}$:

$${{\text{y}}}^{{\text{h}}}=X\beta +\epsilon ,$$

(2)

where ${{\text{y}}}^{{\text{h}}}={\left[{{\text{y}}}_{1+{\text{h}}},{{\text{y}}}_{2+{\text{h}}},\dots ,{{\text{y}}}_{{\text{T}}+{\text{h}}}\right]}{\prime}$, ${\text{X}}={\left[{{\text{x}}}_{1},\dots ,{{\text{x}}}_{{\text{T}}}\right]}{\prime}$ and ${{\text{x}}}_{{\text{t}}}{=\left({{\text{x}}}_{{\text{t}},0},{{\text{x}}}_{{\text{t}},1},{{\text{x}}}_{{\text{t}},2},\dots ,{{\text{x}}}_{{\text{t}},{\text{p}}}\right)}{\prime}$. Furthermore, $\beta ={\left({\beta }_{1},{\beta }_{2},\dots , {\beta }_{p}\right)}{\prime}$ and ${{\text{x}}}_{{\text{t}}}$ are regression coefficients and intraperiod high-frequency observations, respectively. $\epsilon$ is the matrix notation of ${\varepsilon }_{t+h}$, and it is uncorrelated with ${{\text{x}}}_{{\text{t}},0},{{\text{x}}}_{{\text{t}},1},{{\text{x}}}_{{\text{t}},2},\dots ,{{\text{x}}}_{{\text{t}},{\text{p}}}$ and $\epsilon \sim {\text{IN}}(0,{\upsigma }_{\epsilon }^{2})$. For a thorough exploration of the $SLS$ estimator and its various interpretations, we would refer to the comprehensive investigation conducted by Breitung and Roling (2015). This study provides insights and detailed information on this subject matter. To estimate the above equation using the SLS estimator, the subsequent penalized least square objective function is:

$$\widetilde{{\text{S}}}\left({\mathrm{\alpha }}_{0},\upbeta \right)={\Vert {{\text{y}}}^{{\text{h}}}-\mathrm{X\beta }\Vert }^{2}+\uplambda \Vert {{(\nabla }^{2}{\upbeta }_{{\text{j}}})}^{2}\Vert ,$$

(3)

$${(\nabla }^{2}{\upbeta }_{{\text{j}}})={\upbeta }_{{\text{j}}}-2{\upbeta }_{{\text{j}}-1}+{\upbeta }_{{\text{j}}-2}, j=2,\dots ,p,$$

where $\lambda$ is a prespecified smoothing parameter. Notably, when $\lambda$ is set to zero, the above equation reduces to the least square objective function. Adding a penalization term to the objective function to reduce overparameterization is a classical statistical shrinkage method where the most notable contribution is made by Hoerl and Kennard (1970a,b), who introduced the classical ridge estimator. By solving for $\upbeta$ when the first-order derivation of Eq. (3) is set to zero $\left(\partial \widetilde{{\text{S}}}\left(\upbeta \right)/\partial {\upbeta }_{\uplambda }=0\right)$, the resulting estimator from Breitung and Roling (2015) can be written as:

$${\widetilde{\upbeta }}_{\uplambda }={\left({\text{C}}+\mathrm{\lambda G}\right)}^{-1}r,$$

(4)

where the variance of the standardized dependent variable ${{\text{y}}}^{h}$ equals one, ${\text{C}}$ denotes the correlation matrix of regressors, and ${\text{r}}$ denotes the vector of the correlation of the dependent variable with the independent variable.

${\text{C}}={{\text{X}}}^{\mathrm{^{\prime}}}{\text{X}},\mathrm{ r}={{\text{X}}}^{\mathrm{^{\prime}}}{{\text{y}}}^{h},\mathrm{ G}={{\text{DD}}}^{\mathrm{^{\prime}}}, {{{\text{y}}}^{h}}{\prime}{y}^{h}=1$, where ${\text{D}}$ is a matrix and defined below

$${\text{D}}=\left[\begin{array}{cccccc}1& -2& 1& 0& .& 0\\ 0& 1& -2& 1& .& 0\\ .& .& .& .& .& .\\ 0& .& .& 1& -2& 1\end{array}\right]\left({\text{p}}-1\right)\times \left({\text{p}}+1\right).$$

This estimator is denoted as the ${SLS}_{1}$ estimator in this paper. If $\uplambda =0$, then this estimator is reduced to $\widetilde{\upbeta }={\left({\text{C}}\right)}^{-1}{\text{r}}$, that is, the unrestricted MIDAS introduced by Foroni et al. (2015). As discussed in Lipovetsky and Conklin (2005), the objective function in Eq. (3) is minimized in this situation. Therefore, the ${SLS}_{1}$ estimator solves the issue of overparameterization by introducing the biasing parameter $\uplambda$, but at the same time, it decreases the goodness-of-fit. By assuming independence of the high-frequency regressors with the error term (${{\text{X}}}^{\mathrm{^{\prime}}}\upvarepsilon =0$), this may be shown in the MIDAS context using the following identities:

$${{\text{R}}}^{2}+{{\text{S}}}^{2}=1$$

(5)

which can also be written as

$${{\text{R}}}^{2}=1-{{\text{S}}}^{2}={\upbeta }^{\mathrm{^{\prime}}}{\text{r}}={\upbeta }^{\mathrm{^{\prime}}}\mathrm{C\beta }$$

(5a)

and it can be expressed as

$${{\text{R}}}^{2}=1-{{\text{S}}}^{2}={2\upbeta }^{\mathrm{^{\prime}}}{\text{r}}-{\upbeta }^{\mathrm{^{\prime}}}\mathrm{C\beta }={\upbeta }^{\mathrm{^{\prime}}}\left(2{\text{r}}-\mathrm{C\beta }\right),$$

$${{\text{S}}}^{2}=1-{{\text{R}}}^{2}=1-{{\widetilde{{\text{y}}}}^{h}}{\prime}{\widetilde{{\text{y}}}}^{h}={y}^{{h}{\prime}}{y}^{h}-{{\widetilde{{\text{y}}}}^{h}}{\prime}{\widetilde{{\text{y}}}}^{h}={{\widetilde{{\text{y}}}}^{h}}{\prime}\varepsilon .$$

(5b)

Equation (5) exhibits the Pythagorean theorem, with the explained sum of squares ${{\text{R}}}^{2}$ and non-explained ${{\text{S}}}^{2}$ by the regression. The trivial expressions above hold for the U-MIDAS model, but for the ${SLS}_{1}$ estimator, ${{\text{R}}}^{2}$ corresponds to:

$${R}_{{SLS}_{1}}^{2}=2{r}{\prime}{(C+\lambda G)}^{-1}r-{r}{\prime}{\left(C+\lambda G\right)}^{-1}C{(C+\lambda G)}^{-1}r$$

(5c)

which is lower than the ${{\text{R}}}^{2}$ for the U-MIDAS model (see the Supplementary Material for detailed derivations). Hence, the ${SLS}_{1}$ estimator yields a smoothed version of the unrestricted parameter vector, which improves the forecasting properties compared to U-MIDAS. However, it does so by shrinking the unrestricted parameter vector to zero, which decreases the goodness-of-fit. This drawback of shrinkage estimators such as ${SLS}_{1}$ led Lipovetsky and Conklin (2005) to introduce a two-parameter estimator that increases the ${{\text{R}}}^{2}$. Additionally, it decreases the MSE of the estimated parameter vector, which decreases the estimation error (see Sect. 2.2 for detailed derivations). In a forecasting context, this has the potential to decrease the out-of-sample MSE of forecasts since it can be divided into an estimation error and forecasting error. Therefore, in the same manner as Lipovetsky and Conklin (2005), we introduce a two-parameter ${SLS}_{2}$ estimator by generalizing Eq. (3) with two additional terms so that the objective function can be represented as:

$$\widetilde{{\text{S}}}\left({\mathrm{\alpha }}_{0},\upbeta \right)={\Vert {{\text{y}}}^{{\text{h}}}-\mathrm{X\beta }\Vert }^{2}+{\uplambda }_{1}{\Vert {{(\nabla }^{2}{\upbeta }_{{\text{j}}})}^{2}\Vert }^{2}+{\uplambda }_{2}{\Vert {\text{r}}-\upbeta \Vert }^{2}+{\uplambda }_{3}{\Vert {{\text{y}}}^{{{\text{h}}}^{\mathrm{^{\prime}}}}({{\text{y}}}^{{\text{h}}}-\mathrm{X\beta })\Vert }^{2}.$$

(6)

The first item in Eq. (6) represents the least square minimization of residuals, whereas the second item with the parameter ${\uplambda }_{1}$ shrinks the estimates in a similar fashion to ${SLS}_{2}$. The second item with parameter ${\uplambda }_{2}$ corresponds to the minimization of the deviation of estimates $\upbeta$ from the pair correlation ${\text{r}}$ with the dependent variable. The benefit of including this term is that we end up with an interpretable coefficient that has the same sign as the paired correlation. The third parameter ${\uplambda }_{3}$ minimizes the residual due to the Eq. (5b). The net effect of including one more variable in the model corresponds to ${{\text{R}}}^{2}={\upbeta }^{\mathrm{^{\prime}}}{\text{r}}=\sum_{j=1}^{q}{\upbeta }_{{\text{j}}}{r}_{j}=\sum_{j=1}^{n}{R}_{j}^{2}$, where ${\upbeta }_{{\text{j}}}$ and ${r}_{j}$ are the elements of vectors $\upbeta ={{\text{C}}}^{-1}r$ which is a standardized coefficient of regression Eq. (1) and ${{\text{X}}}^{\mathrm{^{\prime}}}{{\text{y}}}^{h}$, respectively. Therefore, this parameter corresponds to obtaining a maximum coefficient of multiple determination, which will lead to a better fit. With this solution, we have one more advantage, we performed numerous estimations by various real data sets, running the calculations by the grid of all parameters ${\text{q}}$. In each estimation we evaluated the coefficient of multiple determination defined by its general form given in Eq. (5aa). By minimizing Eq. (6), we obtain the following equation:

$${\widehat{\upbeta }}_{{\text{q}}}\left(\uplambda \right)={\text{q}}{\left({\text{C}}+\mathrm{\lambda G}\right)}^{-1}{\text{r}},$$

(7)

where $\uplambda$ and ${\text{q}}$ are two parameters that should be estimated based on the data (a detailed description of the estimation is provided in Sect. 3.2). Notably, the two-parameter ${SLS}_{2}$ estimator proposed in Eq. (7) is different from estimator (4) in terms of the second parameter ${\text{q}}$. Thus, the solution of the two-parameter estimator is proportional to the one-parameter estimator given in Eq. (4). This minor change leads to an increased goodness-of-fit. For a fixed value $\uplambda$, the value of the parameter ${\text{q}}$ can be calculated by considering the criterion of the maximum quality of the regression and that is measured as the coefficient of multiple determination. By substituting Eq. (7) into (5a), we obtain the following expression for the coefficient of multiple determination:

$${{\text{R}}}^{2}=2{{\text{qr}}}^{\mathrm{^{\prime}}}{\left({\text{C}}+\mathrm{\lambda G}\right)}^{-1}{\text{r}}-{{\text{q}}}^{2}{{\text{r}}}^{\mathrm{^{\prime}}}{\left({\text{C}}+\mathrm{\lambda G}\right)}^{-1}{\text{C}}\left({\text{C}}+\mathrm{\lambda G}\right){\text{r}}=2{{\text{qQ}}}_{1}-{{\text{q}}}^{2}{{\text{Q}}}_{2}$$

(8)

which can be solved for q:

$${\text{q}}=\frac{{{\text{Q}}}_{1}}{{{\text{Q}}}_{2}}=\frac{{{\text{r}}}^{\mathrm{^{\prime}}}{({\text{C}}+\mathrm{\lambda G})}^{-1}}{{{\text{r}}}^{\mathrm{^{\prime}}}{({\text{C}}+\mathrm{\lambda G})}^{-1}{\text{C}}{({\text{C}}+\mathrm{\lambda G})}^{-1}}.$$

(9)

Hence, ${{\text{Q}}}_{1}$ and ${{\text{Q}}}_{2}$ stand for two quadratic forms. Equation (9) is the value of ${\text{q}}$, where this function reaches its maximum value and will be dependent on the parameter $\uplambda$. Substituting ${\text{q}}$ from Eq. (9) into (8) yields ${{\text{R}}}^{2}= {{\text{r}}}^{\mathrm{^{\prime}}}\upbeta = {\upbeta }^{\mathrm{^{\prime}}}\mathrm{C\beta }$. This expression illustrates that ${{\text{R}}}^{2}$ corresponds to the estimator given in Eq. (5a). Hence, in the ${SLS}_{2}$ solution, the smoothing parameter $\uplambda$ is a regularizing regression model, and the parameter ${\text{q}}$ is used to amend the quality of the fit. In addition, the proposed ${SLS}_{2}$ estimator given in Eq. (7) satisfies the condition of orthogonality and improves the goodness of fit of the regression model by increasing the ${{\text{R}}}^{2}$. The ${SLS}_{2}$ estimator can be interpreted as a two-parameter ridge shrinkage estimator and can also be interpreted in a Bayesian manner (see Lipovetsky and Conklin 2005 for details).

2.2 Matrix mean squared error comparison

The MSE criterion for the parameter vector is important since the forecasting error can be decomposed into an estimation error and a random error term. The estimation error can be decreased using an improved estimation technique that decreases the MSE, such as the ${SLS}_{2}$ estimator. Therefore, we compare the proposed ${SLS}_{2}$ estimator with the previously mentioned ${SLS}_{1}$ estimator under the MSE criterion. The general expression for the MSE of an estimator $\widetilde{\beta }$ of the true parameter vector $\beta$ is:

$$MSE\left( \widetilde{\beta } \right)=E\left[\left(\widetilde{\beta }-\beta \right){(\widetilde{\beta }-\beta )}{\prime}\right]=Var\left(\widetilde{\beta }\right)+Bias\left(\widetilde{\beta }\right){Bias(\widetilde{\beta })}{\prime},$$

where $Var\left(\widetilde{\beta }\right)= E\left[\left(\widetilde{\beta }-\beta \right){(\widetilde{\beta }-\beta )}{\prime}\right]$ and $Bias\left(\widetilde{\beta }\right)=E\left(\widetilde{\beta }\right)- \beta$. In this section, we compare the theoretical performance of ${SLS}_{1}$ with that of our proposed estimator ${SLS}_{2}$. By using the spectral decomposition of $C=P\Lambda P$, where $\Lambda$ is a diagonal matrix whose diagonal elements are eigenvalues of the ${X}{\prime}X$ matrix and $P$ is the $p\times p$ matrix whose elements are the eigenvectors of the ${X}{\prime}X$ matrix, the OLS, ${SLS}_{1}$ and ${SLS}_{2}$ estimators can be reshaped as

$$\widehat{\beta }=\left(P{\Lambda }^{-1}{P}{\prime}\right)r,$$

(10)

$$\widetilde{\beta }\left(\lambda \right)={\left(P\Lambda {P}{\prime}+\lambda G\right)}^{-1}r,$$

(11)

$${\widetilde{\beta }}_{q}\left(\lambda \right)=q{\left(P\Lambda {P}{\prime}+\lambda G\right)}^{-1}r.$$

(12)

The corresponding MSEs of these estimators are respectively,

$$MSE\left(\widehat{\beta }\right)={\sigma }^{2} P{\Lambda }^{-1}{P}{\prime},$$

(13)

$$MSE\left(\widetilde{\beta }\left(\lambda \right)\right)={\sigma }^{2} {G}_{\lambda }+{\left[Bias\left({\widetilde{\beta }}_{\lambda }\right)\right]}{\prime}\left[Bias\left({\widetilde{\beta }}_{\lambda }\right)\right],$$

(14)

$$MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)={q}^{2}{\sigma }^{2} {G}_{\lambda }+{\left[Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)\right]}{\prime}\left[Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)\right],$$

(15)

where ${G}_{\lambda }={(P\Lambda {P}{\prime}+\lambda G)}{\prime}P\Lambda {{\text{P}}}{\prime}{(P\Lambda {{\text{P}}}{\prime}+\lambda G)}^{-1}$, $Bias \left(\widetilde{\beta }\left(\lambda \right)\right)= -\lambda (P\Lambda {P}{\prime}+\lambda G)\beta$ and $Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)=\left[q{(P\Lambda {{\text{P}}}{\prime}+\lambda G)}^{-1}P\Lambda {{\text{P}}}{\prime}-G\right] \beta .$

A comparison of the MSEs can be conducted using the quality of fit $q>1$, which is attained by maximizing the multiple correlation coefficient (see Lipovetsky 2006). Furthermore, the following corollaries of Farebrother (1976) and Trenkler and Toutenburg (1990) are used to compare them. Then, Theorem 2.1 provides the necessary and sufficient condition to compare the MSE of unrestricted MIDAS and ${SLS}_{2}$ for a fixed value of $q$. The empirical relevance is that if $q$ is greater than 1, then $MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)$ will be lower than that for $MSE\left(\widehat{\beta }(\lambda )\right)$. This is a natural result since when q < 1, the parameters of ${SLS}_{2}$ are pushed further to zero, which decreases the in-sample fit and the MSE. Therefore, for values of q greater than 1, we can expect a lower MSE, which decreases the estimation error and leads to more efficient forecasting results. Theorems 2.2 and 2.3 are given to show the positive definite matrix condition and variances and MSEs of ${SLS}_{1}$ and ${SLS}_{2}$, respectively. A positive definite matrix can be attained if $\lambda >0 and q>1$. If $q=1$, then ${SLS}_{1}$ and ${SLS}_{2}$ become identical. However, in Theorem 2.3, some condition has been given to attain the positive definite matrix of the variance of ${SLS}_{1}$ and ${SLS}_{2}$.

Corollary 2.1

(Farebrother 1976). We denote A as a positive definite matrix and a as a nonzero vector with a positive scalar $\theta$. Then, the necessary and sufficient condition for $\theta A-a{a}{\prime}>0$ is that ${a}{\prime}{A}^{-1}a< \theta$.

Corollary 2.2

(Trenkler and Touthnburg 1990). ${\widetilde{\beta }}_{1}$ and ${\widetilde{\beta }}_{2}$ are two estimators of $\beta$. Let $H=Var\left({\widetilde{\beta }}_{1}\right)-Var ({\widetilde{\beta }}_{2})$ be a positive definite matrix, ${h}_{1}=Bias ({\widetilde{\beta }}_{1})$ and ${h}_{2}=Bias ({\widetilde{\beta }}_{2})$.

Then, $MSE \left({\widetilde{\beta }}_{1}\right)-MSE \left({\widetilde{\beta }}_{2}\right)>0 iff {{h}_{2}}{\prime}\left(H+{h}_{1}{{h}_{1}}{\prime}\right){h}_{2}<1.$

Theorem 2.1: Necessary and sufficient conditions for the estimator

Based on the assumption $\lambda =0$, and that the value of $q$ is fixed, this Theorem 2.1 provides the standard to compare the MSE of the unrestricted MIDAS and ${SLS}_{2}$.

Let $\lambda >{\lambda }_{1j} and {b}_{2}=Bias ({\widetilde{\beta }}_{q}\left(\lambda \right))$. The necessary and sufficient condition for $MSE\left(\widehat{\beta }(\lambda )\right)-MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)>0$ is ${{b}_{2}}{\prime}{\left[\left({C}^{-1}-{q}^{2}{G}_{\lambda }\right)\right]}^{-1}{b}_{2}<{\sigma }^{2}, where {\lambda }_{1j}={k}_{j}\left(q-1\right), j=1,\dots ,p.$

Proof

The difference between the matrix of the MSE of the estimator’s Eqs. (10) and (12) is

$$MSE\left(\widehat{\beta }(\lambda )\right)-MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)= {\sigma }^{2}{\left({C}^{-1}-{q}^{2}{G}_{\lambda }\right)}^{-1}{-b}_{2}{{b}_{2}}{\prime}={\sigma }^{2}\left[P diag{\left\{\frac{1}{{k}_{j}}-\frac{{q}^{2}{k}_{j}}{{\left({k}_{j}+\lambda \right)}^{2}}\right\}}_{j=1}^{p}{P}{\prime}\right]{-b}_{2}{{b}_{2}}{\prime},$$

where $\left({C}^{-1}-{q}^{2}{G}_{\lambda }\right)$ is positive definite if ${\lambda }^{2}+2{k}_{j}\lambda +\left(1-{q}^{2}\right){{k}_{j}}^{2}>0. { Now, \lambda }^{2}+2{k}_{j}\lambda +\left(1-{q}^{2}\right){{k}_{j}}^{2}$ is positive, when $\lambda > {\lambda }_{1j}={k}_{j}(q-1)$. By Corollary 2.1, the proof of the theorem is completed.

Theorem 2.2: Positive definite matrix condition

Let ${b}_{1}=Bias\left({\widetilde{\beta }}_{\lambda }\right) and {b}_{2}= Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right) if {{b}_{1}}{\prime}{G}_{\lambda }{b}_{1}< {\sigma }^{2}\left({q}^{2}-1\right)$ then $MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-MSE\left(\widetilde{\beta }\left(\lambda \right)\right)>0$.

Proof

$MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-MSE\left(\widetilde{\beta }\left(\lambda \right)\right)=Var\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-Var\left(\widetilde{\beta }\left(\lambda \right)\right)+{b}_{2}{{b}_{2}}{\prime}-{b}_{1}{{b}_{1}}{\prime}={\sigma }^{2} \left({q}^{2}-1\right){G}_{\lambda }+{b}_{2}{{b}_{2}}{\prime}-{b}_{1}{{b}_{1}}{\prime}.$

${\sigma }^{2} \left({q}^{2}-1\right){G}_{\lambda }$ is always a positive definite matrix for $\lambda >0 and q>1$. Furthermore, ${b}_{2}{{b}_{2}}{\prime}$ is also a positive definite matrix. Therefore, the problem of whether $MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-MSE\left(\widetilde{\beta }\left(\lambda \right)\right)$ is a positive definite matrix reduces to that of determining whether ${\sigma }^{2} \left({q}^{2}-1\right){G}_{\lambda }-{b}_{1}{{b}_{1}}{\prime}$ is a positive definite matrix. Accordingly, by using Corollary 1, we have that ${\sigma }^{2} \left({q}^{2}-1\right){G}_{\lambda }-{b}_{1}{{b}_{1}}{\prime}$ is a positive definite matrix; thus, the proof is completed.

Theorem 2.3: Comparison of the two estimators’ sampling variances

Let ${b}_{1}=Bias\left({\widetilde{\beta }}_{\lambda }\right) and {b}_{2}= Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)$.

$$MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-MSE\left(\widetilde{\beta }\left(\lambda \right)\right)>0 iff {{b}_{1}}{\prime}{\left({\sigma }^{2}\left({q}^{2}-1\right){G}_{\lambda }+{b}_{2}{{b}_{2}}{\prime}\right)}^{-1}{b}_{1}<1.$$

Proof

For $\lambda >0$ and $q>1, Var\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)- Var\left(\widetilde{\beta }\left(\lambda \right)\right)= {\sigma }^{2}({q}^{2}-1){G}_{\lambda }$, which is a positive definite matrix. Therefore, using Corollary 2.2, the proof is accomplished.

2.3 Estimation of the smoothing parameter ${\varvec{\lambda}}$ and the predicting parameter q

In this section, we demonstrate how to estimate the smoothing parameter $\lambda$ and the prediction of parameter $q$ for the ${SLS}_{2}$ estimator. A potential approach for $\lambda$ is cross-validation by fixing the $q$ value. Another commonly used method to choose the smoothing parameter $\lambda$ by minimizing the AIC (Breitung and Roling 2015). In this article, we rely on the Breitung and Roling (2015) method by first minimizing the AIC. Then we improve the fit of the regression model using the additional parameter $q$. We observe more reliable results and good fit of regression model with the additional parameter $q$. Additionally, we will outline the criteria used for selecting the smoothing parameter. We define matrix $\Lambda =diag({k}_{1,}\dots .,{k}_{p})$ as a diagonal matrix whose diagonal elements are the eigenvalues of $X{X}{\prime}$, whereas $\Upsilon$ is a $p\times p$ matrix whose elements are the eigenvectors of matrix $X{X}{\prime}$ fulfilling ${\Upsilon }{\prime}X{X}{\prime}\Upsilon =\Lambda , {\Upsilon }{\prime}\Upsilon =G$. Subsequently, the original model can be written in canonical form,

$${y}^{h}=Z\alpha +{\varepsilon }_{t+h},$$

(16)

where $Z=X\Upsilon , \alpha ={\Upsilon }{\prime}\beta$, and ${Z}{\prime}Z={\Upsilon }{\prime}X{X}{\prime}\Upsilon =\Lambda$. Then, ${\widehat{\alpha }}_{q}\left(\lambda \right)={\Upsilon }{\prime}{\widetilde{\beta }}_{q}\left(\lambda \right)$, and $MSE\left({\widehat{\alpha }}_{q}\left(\lambda \right)\right)={\Upsilon }{\prime}MSE({\widetilde{\beta }}_{q}(\lambda ))\Upsilon$. Therefore, the MSE of estimator ${\widehat{\alpha }}_{q}\left(\lambda \right)$ can be written as

$$MSE\left({\widehat{\alpha }}_{q}\left(\lambda \right)\right)= {q}^{2}{\sigma }^{2}{\left(\Lambda +\lambda G\right)}^{-1}\Lambda {\left(\Lambda +\lambda G\right)}^{-1}+\left[q{\left(\Lambda +\lambda G\right)}^{-1}\Lambda -G\right]\alpha {\alpha }{\prime}{\left[q{\left(\Lambda +\lambda G\right)}^{-1}\Lambda -G\right]}{\prime}.$$

The next target is then to obtain the optimal values of the parameters $\lambda$ and $q$ by minimizing the given equations:

$$f\left(\lambda ,q\right)=trace\left(MSE\left({\widehat{\alpha }}_{q}\left(\lambda \right)\right)\right)=\sum_{i=1}^{p}\frac{{q}^{2}{\sigma }^{2}{k}_{i}+{\alpha }_{i}^{2}{(q{k}_{i}-{k}_{i}-\lambda )}^{2}}{{({k}_{i}+\lambda )}^{2}}.$$

(17)

Here, $f\left(\lambda ,q\right)$ is a quadratic function of the parameter $q$, so the value of $q$ can be derived by minimizing $f\left(\lambda ,q\right)$ for a fixed $\lambda$ with respect to $q$ and by setting them equal to zero. After the parameters ${\sigma }^{2}$ and ${\alpha }_{i}^{2}$ are replaced by their corresponding unbiased estimators ${\widehat{\sigma }}^{2}$ and ${\widehat{\alpha }}_{i}^{2}$, we obtain the optimal estimator of $q$ for fixed values of $\lambda$ as

$$\frac{\partial f\left(\lambda ,q\right)}{\partial q}=\sum_{i=1}^{p}\frac{\left[2q{\sigma }^{2}{k}_{i}+2{\alpha }_{i}^{2}{k}_{i}\left(q{k}_{i}-{k}_{i}-\lambda \right)\right]{\left({k}_{i}+\lambda \right)}^{2}}{{\left({k}_{i}+\lambda \right)}^{4}},$$

$${\widehat{q}}_{opt}=\frac{\frac{\sum_{i=1}^{p}{\widehat{\alpha }}_{i}^{2}{k}_{i}}{{k}_{i}+\lambda }}{\sum_{i=1}^{p}\frac{{{\widehat{\sigma }}^{2}k}_{i}+{\widehat{\alpha }}_{i}^{2}{k}_{i}}{{{(k}_{i}+\lambda )}^{2}}}.$$

(18)

Thus, we may derive the value of $\lambda$ that minimizes $f(\lambda ,q)$ with respect to $\lambda$ and set the equation equal to zero, which is accomplished below. We can find the optimal value by setting the unbiased estimate of ${\alpha }_{i}^{2}$ and ${\sigma }^{2}$ by fixing the $q$ value. The unbiased estimates of the ${\alpha }_{i}^{2}$ and ${\sigma }^{2}$ can be achieved by initiating the iterative procedure which is explained in detail in Sect. 7 of Hoerl and Kennard (1970). The optimal value of $\lambda$ corresponds to:

$${\widehat{\lambda }}_{opt}=\frac{q\sum_{i=1}^{p}{\widehat{\sigma }}^{2}{k}_{i}+(q-1)\sum_{i=1}^{p}{{\widehat{\alpha }}_{i}}^{2}{k}_{i}}{\sum_{i=1}^{p}{{\widehat{\alpha }}_{i}}^{2}{k}_{i}}.$$

(19)

In practice, both parameters $\lambda$ and q depend on each other, so, as a first step, we estimate the optimal value of $\lambda$ by minimizing the Akaike information criterion (AIC), as suggested by Breitung and Roling (2015). Subsequently, we calculate the optimal value of q based on the $\lambda$ estimated value.

3 Simulation and estimation performance

3.1 Design of the experiment

In this section, we compare the small-sample properties of our nonparametric variant of the ${SLS}_{1}$ estimator and the two-parameter ${SLS}_{2}$ estimators of the MIDAS regression. For a comparative study, following Andreou et al. (2010), we generate data from the following equation:

$${y}_{t+h}={\beta }_{0}+\sum_{j=0}^{p}{\beta }_{j}{x}_{t,j}+{\varepsilon }_{t+h},$$

(20)

$${\beta }_{j}={\alpha }_{1}{\omega }_{j}\left(\theta \right),$$

(21)

$${\varepsilon }_{t+h}\sim {{\text{N}}}_{iid}\left(\mathrm{0,0.125}\right),$$

(22)

where $t=\mathrm{1,2},\dots ,T$ and ${\beta }_{0}=0.5, {\text{and}} {\omega }_{j}\left(.\right)$ is a weighting function that can be chosen in several specifications presented below. The high-frequency regressor is generated by the $AR(1)$ process given below:

$${x}_{t,j}={\alpha }_{0}+\varrho {x}_{t,j-1}+{\varepsilon }_{j,t}, {\varepsilon }_{j,t}\sim iid N\left(\mathrm{0,1}\right),$$

(23)

where $j=\mathrm{0,1},\dots .,p$, ${x}_{t,j-p-k}={x}_{t-1,j-k}$ for all $k>0$. Correspondingly, ${x}_{t,j}$ denotes the ${j}_{th}$ lag of the $AR\left(1\right)$ series ${x}_{t,0}$. As suggested by Andreou et al. (2010), we use ${\alpha }_{0}=0.5$ and $\varrho =0.9$. The MSE of the proposed estimator can be obtained as $E\left[{\left({\widetilde{\beta }}_{q\left(\lambda \right)}-\beta \right)}{\prime}\left({\widetilde{\beta }}_{q\left(\lambda \right)}-\beta \right)\right]=q\left(\left({\beta }{\prime}{{\Psi }_{T}}{\prime}\left(\lambda \right){\Psi }_{T}\left(\lambda \right)\beta \right)+{{\sigma }^{2}}_{u}trace\left[{\left({X}{\prime}X+\lambda G\right){X}{\prime}X\left({X}{\prime}X+\lambda G\right)}^{-1}\right]\right)$, where ${\Psi }_{T}\left(\lambda \right)= \lambda {\left({X}{\prime}X+\lambda G\right)}^{-1}G.$ The chosen sample size for the model, (20)-(22), is $T \sim \left\{100, 200, 400\right\}$ and the considered number of high-frequency lags is $\left(p+1\right)\sim \left\{20, 40, 60\right\}$. The scale parameter given in Eq. (21) is chosen as ${\alpha }_{1}\sim (0.2, 0.3, 0.4)$ to model small, medium, and large signal-to-noise ratios. However, the applied ${R}^{2}$ responds slightly differently due to different sample sizes, number of high-frequency lags and weighting functions. These signal-to-noise ratios approximately satisfy $0.40\le {R}^{2}<0.50, 0.50\le {R}^{2}\le 0.70 and {R}^{2}>0.70$. The number of Monte-Carlo replications is 5000.

The analysis using Monte-Carlo simulations considers a lag distribution that fits both the nonparametric and parametric MIDAS frameworks (Breitung and Roling 2015). Three different weighting functions are considered and presented in the equation format given below. The weighting function generated for this exercise is meant to cover different types of behavior of the high-frequency weights, including swift decay where many weights are approximately equal to zero after cutting off some lag and slow decay. The initial experiment was conducted by applying the exponential weight (slow decay) function given as

$${\omega }_{j}\left(\theta \right)=\frac{{\text{exp}}({\theta }_{1}j+{\theta }_{2}{j}^{2})}{\sum_{i=0}^{p}{\text{exp}}({\theta }_{1}i+{\theta }_{2}{i}^{2})}, j=\mathrm{0,1}\dots p.$$

We follow Andreou et al. (2010) and set ${\theta }_{1}=7\times {10}^{-4}$ and ${\theta }_{2}=-6\times {10}^{-3}$. The second experiment employs a hump-shaped weighting function.

$${\omega }_{j}\left(\theta \right)=\frac{{\text{exp}}({\theta }_{1}j+{\theta }_{2}{j}^{2})}{\sum_{i=0}^{p}{\text{exp}}({\theta }_{1}i+{\theta }_{2}{i}^{2})}, j=\mathrm{0,1}\dots p.$$

We govern the parameter such that the weighting function reaches a maximum at $j=6, j=10, j=16$ when 20, 40 and 60 lags are retained. We choose ${\theta }_{1}=8\times {10}^{-2} and {\theta }_{2}=\frac{{\theta }_{1}}{10},{\theta }_{2}=\frac{{\theta }_{1}}{20},and {\theta }_{2}=\frac{{\theta }_{1}}{30}$. The third experiment is run by utilizing the sign-changing weighting function given as

$${\omega }_{j}\left({c}_{1},{c}_{2}\right)=\frac{{c}_{1}}{p+1}\left[{\text{sin}}\left({c}_{2}+\frac{j2\pi }{p}\right)\right],$$

where ${c}_{2}=1\times {10}^{-2},{c}_{1}=5, {c}_{1}=2.5, and {c}_{1}= 5/3$ for lags 20, 40, and 60, ensuring that these weights sum to one. The simulation results are given in Table 1.

Table 1 In-sample RMSE ratios

Full size table

3.2 Results and discussion

In Table 1, we present a comparison in terms of the root mean squared error (RMSE) ratios between the ${SLS}_{2}$ and ${SLS}_{1}$ estimators. Since Breitung and Roling (2015) demonstrate that the ${SLS}_{1}$ nonparametric approach is superior to the usual parametric approach, we compare only the ${SLS}_{1}$ to the improved two-parameter ${SLS}_{2}$ estimator. The results in Table 1 are summarized as follows. First, the proposed nonparametric ${SLS}_{2}$ estimator with an additional parameter $q$ dominates the ${SLS}_{1}$ nonparametric approach for in-sample estimation in almost all situations considered. This improvement is often very substantial, with a relative RMSE close to zero. Clearly, the proposed ${SLS}_{2}$ estimator outperforms the ${SLS}_{1}$ estimator for in-sample estimation for hump-shaped weighting functions. For exponentially declining weights and hump-shaped weights, our new method dominates the ${SLS}_{1}$ one-parameter estimator suggested by Breitung and Roling (2015) in all situations. For sign-changing weights, the proposed two-parameter ${SLS}_{2}$ estimator performs equally decent, except when $p+1=20$. However, our results yield lower RMSE than the estimator suggested by Breitung and Rolling (2015) in most situations. However, when the lag length equals 20 and ${\alpha }_{1}=0.3$ or ${\alpha }_{1}=0.4,$ the ${SLS}_{1}$ estimator is more efficient in terms of RMSE for almost all situations considered. Moreover, Table 1 indicates that as the sample size increases, the ${\widehat{q}}_{opt}$ value decreases. We believe, based on these simulation results, that for large sample sizes, ${\widehat{q}}_{opt}$ will be close to one, which indicates a convergence between the ${SLS}_{1}$ and ${SLS}_{2}$ estimators. Additionally, we present a graphical representation of the RMSE values provided in Table 1 in Appendix 2 to enhance clarity and understanding of the results.

4 Empirical application

4.1 The performance measures

In this section we introduced real life empirical application to observe the performance of our proposed method. In this example, we investigate the in-sample properties and the out-of-sample forecasts of the two different estimation methods. A clear link exists between the in-sample and out-of-sample properties of estimators, as described by Breitung and Roling (2015). The MSE of the out-of-sample forecast may be decomposed in the following way:

$${y}_{t+h}-{\widehat{y}}_{\left.t+h\right|t}=\left(E\left({y}_{t+h}\right)-{\widehat{y}}_{\left.t+h\right|t}\right)+{u}_{t+h},$$

where $E\left({y}_{t+h}\right)-{\widehat{y}}_{\left.t+h\right|t}$ corresponds to the estimation error and ${u}_{t+h}$ to the error term. Our methods, as shown in the mean square error comparison in Sect. 2.2 and the results of the simulations in Sect. 3.2, decrease the estimation error and therefore have the potential to improve the out-of-sample forecasts. To obtain the out-of-sample forecasts, we regress the future values of the dependent variable (mathematically denoted as ${y}_{t+h}$) on current or past values of the regressor. In the first step, we split the sample into an estimation sample $t=1,\dots , {T}^{e}$ and a forecasting sample $t={T}^{e}+1,\dots ,T,$ where ${n}_{f}=T-{T}^{e}$ denotes the number of forecasted values. Given the estimated lag distribution from the estimation sample, one-step-ahead forecasts are constructed. The first observation from the estimation sample is dropped, while the end of the sample is extended to include the observation in the next period. We assess forecasts according to the root mean squared forecast error:

$$RMSE= \sqrt{\frac{1}{nf}\sum_{t={T}^{e+1}}^{{T}^{e}+nf}{\left({y}_{t+h}-{\widehat{y}}_{t+h\vdots t}\right)}^{2}.}$$

4.2 In-sample estimation and forecasting of the inflation rate

Frequent forecasts of the inflation rate have become increasingly important in economic decision making. The empirical importance for policymakers stems from the issue that many contracts are set in nominal terms while real prices are of practical relevance. An important indicator of the inflation rate is oil returns. In Breitung et al. (2013), the predictive power of 15 different daily indicators for the German inflation rate was analyzed. The results of the different MIDAS regressions demonstrated that the predictive power of crude oil prices on the monthly inflation rate yielded an ${R}^{2}>0.3$. The other potential indicators exhibited an ${R}^{2}>0.1$. These results are in line with those in Bruneau et al. (2007), where energy prices were shown to play a dominant role in inflation forecasting, while other indicators, such as exchange rates, were shown to be poor predictors of inflation changes. Therefore, in this empirical application, we choose to forecast the US inflation rate based on crude oil price data. We use US data from the Federal Reserve Bank of St. Louis. The monthly data start in 1986 and end in 2020. The in-sample period spans from February 1986 to December 2012. We assess the predicted capabilities of different possible MIDAS methods, which includes our proposed methods. We utilize extending window forecast to closely illustrate our simulation results. It should be noted that our methods can be used for other forecasting schemes as well such as rolling window. We perform one-step-ahead out-of-sample forecasts, and the out-of-sample period is from January 2012 to January 2020. The oil price data are based on WTI daily crude oil prices (regressor). The monthly data consists of seasonally adjusted monthly data (regressand). To avoid nonstationarity issues, we transform both variables into first differences of the log, in line with Breitung and Roling (2015); therefore, we forecast the inflation rate. The regressand is shown in Fig. 1, while the regressor is shown in Fig. 2. We observe some peaks, especially during the 1990s, but no major structural changes. The in-sample results are shown in Table 2, where we see a decrease in the RMSE of the new estimator. This decrease is substantial, and the ratio is as small as 0.2172 for p + 1 = 40. Furthermore, as expected, based on the results in Sect. 2, we observe a substantial increase in R². The results for R² are in line with those obtained by Breitung et al. (2013) and shows the importance of oil returns as an indicator of the inflation rate. This indicates a considerable improvement regarding the in-sample properties. In Fig. 3, we present graphs of the unrestricted estimated parameters and the ${SLS}_{1}$ and ${SLS}_{2}$ estimators. The SLS-type estimators smooth the erratic behavior of the unrestricted estimator. For p + 1 = 20, there appears to be a linear relationship between the variables, but when the lag length increases to p + 1 = 60, it becomes more hump-shaped. Moreover, ${SLS}_{1}$ shrinks the coefficients to be very close to zero, while ${SLS}_{2}$ estimates a larger impact of the regressor on the inflation rate. In Table 3 we show results for ${SLS}_{1}$ and ${SLS}_{2}$. Further, as a benchmark we forecast using U-MIDAS and the parametric Estimated Almon Lag Polynomial (EALP).

Table 2 In-sample estimation of the monthly inflation rate based on oil returns

Full size table

Table 3 RMSE of the out-of-sample forecast (window forecasting)

Full size table

Table 3 displays the performance outcomes of utilizing the extending window technique to assess out-of-sample performance. It is important to highlight that our approach is not limited to the extending window method; other techniques like the rolling window can also be employed. The results presented in Table 3 demonstrate significant enhancement in out-of-sample performance, as confirmed by the statistically significant Diebold and Mariano test. This improvement exists for all lag lengths, but it is the greatest for lag length 40. As shown in the simulation study, a substantial improvement in the in-sample properties exists, with a lower RMSE, which increases the precision of the estimated parameters and decreases the estimation error. In this empirical application, we also note that this improvement holds for real data and that this improvement may lead to a decrease in the out-of-sample forecasting performance.

5 Summary and concluding remark

Combining variables at different frequencies in the same model induces difficulty in estimating the autoregressive lag distribution, where the effective sample size consists of low-frequency variables. Breitung and Roling (2015) propose a smooth least squares approach (${SLS}_{1})$ for unrestrictive autoregressive coefficients. In contrast to the Almon or Beta lag distribution used in Ghysels et al. (2006), Breitung and Roling’s (2015) ${SLS}_{1}$ approach ensures a flexible but smooth trending lag distribution. However, even if the biasing parameter of ${SLS}_{1}$ solves the overparameterization problem, this approach leads to a decreased goodness-of-fit. Therefore, we use a method based on Lipovetsky and Conklin (2005), where we generalize this shrinkage regression into a two-parameter smoothed least-squares estimator (${SLS}_{2}$). Our ${SLS}_{2}$ estimator contains an additional parameter $q$ that improves the in-sample fit. We also present mathematical derivations of the conditions under which our new estimator improves the MSE properties. The latter property is important in a forecasting context since it leads to a lower estimation error, which decreases the out-of-sample forecasting error. Furthermore, we conduct Monte-Carlo simulations of the in-sample properties, and in an empirical application, we show the potential benefit for out-of-sample forecasts. Our results clearly demonstrate the advantages of our new ${SLS}_{2}$ approach. Thus, compared to Breitung and Roling’s (2015) one-parameter ridge approximation (${SLS}_{1})$, our two-parameter (${SLS}_{2}$) shrinkage approach for mixed-frequency models is superior. It leads to good properties of orthogonality between residuals and the predicted dependent variable, which results in a consistent solution. Furthermore, it improves the in-sample properties and is likely to improve out-of-sample forecasts compared to the ${SLS}_{1}$ estimator. Therefore, our new method can be applied as an alternative to previous approaches in the literature, especially when it is difficult to impose a structural pattern for the parameters. Therefore, in comparison to previous research, we demonstrate the strength and practical relevance of our method in generating precise forecasts/nowcasts for important variables and models in economics.

References

Almon S (1965) The distributed lag between capital appropriations and net expenditures. Econometrica 33:178–196
Article Google Scholar
Andreou E, Ghysels E, Kourtellos A (2010) Regression models with mixed sampling frequencies. J Econom 158:246–261
Article MathSciNet Google Scholar
Andreou E, Ghysels E, Kourtellos A (2011) Forecasting with mixed-frequency data. In: The Oxford handbook of economic forecasting. Oxford University Press, Oxford
Andreou E, Ghysels E, Kourtellos A (2013) Should macroeconomic forecasters use daily financial data and how? J Bus Econ Stat 31:240–251
Article MathSciNet Google Scholar
Breitung J, Roling C (2015) Forecasting inflation rates using daily data: a nonparametric MIDAS approach. J Forecast 34:588–603
Article MathSciNet Google Scholar
Breitung J, Roling C, Elengikal S (2013) Forecasting inflation rates using daily data: a nonparametric MIDAS approach. Working paper. University of Bonn, Bonn
Bruneau C, De Bandt O, Flageollet A, Michaux E (2007) Forecasting inflation using economic indicators: the case of France. J Forecast 26(1):1–22
Article MathSciNet Google Scholar
Clements MP, Galvão AB (2008) Macroeconomic forecasting with mixed-frequency data: forecasting output growth in the United States. J Bus Econ Stat 26:546–554
Article MathSciNet Google Scholar
Clements MP, Galvão AB (2009) Forecasting US output growth using leading indicators: an appraisal using MIDAS models. J Appl Econom 24:1057–1217
Article MathSciNet Google Scholar
Farebrother RW (1976) Further results on the mean square error of ridge regression. J R Stat Soc B 38:248–250
MathSciNet Google Scholar
Foroni C, Marcellino M, Schumacher C (2015) Unrestricted mixed data sampling (MIDAS): MIDAS regressions with unrestricted lag polynomials. J R Stat Soc A 178:57–82
Article MathSciNet Google Scholar
Ghysels E, Wright JH (2009) Forecasting professional forecasters. J Bus Econ Stat 27:504–516
Article MathSciNet Google Scholar
Ghysels E, Santa-Clara P, Valkanov R (2004) The MIDAS touch: mixed data sampling regression models. UNC and UCLA working papers
Ghysels E, Santa-Clara P, Valkanov R (2005) There is a risk-return trade-off after all. J Financ Econ 76(3):509–548
Article Google Scholar
Ghysels E, Santa-Clara P, Valkanov R (2006) Predicting volatility: getting the most out of return data sampled at different frequencies. J Econom 131(1–2):59–95
Article MathSciNet Google Scholar
Ghysels E, Sinko A, Valkanov R (2007) MIDAS regressions: further results and new directions. Econom Rev 26:53–90
Article MathSciNet Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
Kvedaras V, Račkauskas A (2010) Regression models with variables of different frequencies: the case of a fixed frequency ratio. Oxf Bull Econ Stat 72:600–620
Article Google Scholar
Lipovetsky S (2006) Two-parameter ridge regression and its convergence to the eventual pairwise model. Math Comput Model 44:304–318
Article MathSciNet Google Scholar
Lipovetsky S, Conklin WM (2005) Ridge regression in two-parameter solution. Appl Stoch Models Bus Ind 21:525–540
Article MathSciNet Google Scholar
Modugno M (2013) Now-casting inflation using high frequency data. Int J Forecast 29:664–675
Article Google Scholar
Monteforte L, Moretti G (2013) Real-time forecasts of inflation: the role of financial variables. J Forecast 32:51–61
Article MathSciNet Google Scholar
Pan Z, Wang Q, Wang Y, Yang L (2018) Forecasting US real GDP using oil prices: a time-varying parameter MIDAS model. Energy Econ 72:177–187
Article Google Scholar
Penev S, Leonte D, Lazarov Z, Mann RA (2014) Applications of MIDAS regression in analysing trends in water quality. J Hydrol 511:151–159
Article Google Scholar
Toker S (2020) Investigating the two parameter analysis of Lipovetsky for simultaneous systems. Stat Pap 61:2059–2089
Article MathSciNet Google Scholar
Trenkler G, Toutenburg H (1990) Mean squared error matrix comparisons between biased estimators—an overview of recent results. Stat Pap 31:165–178
Article MathSciNet Google Scholar
Valadkhani A, Smyth R (2017) How do daily changes in oil prices affect US monthly industrial output? Energy Econ 67:83–90
Article Google Scholar
Wang J, Huang Y, Ma F, Chevallier J (2020) Does high-frequency crude oil futures data contain useful information for predicting volatility in the US stock market? New evidence. Energy Econ 91:104897
Article Google Scholar
Zhang YJ, Wang JL (2019) Do high-frequency stock market data help forecast crude oil prices? Evidence from the MIDAS models. Energy Econ 78:192–201
Article Google Scholar

Download references

Acknowledgements

The authors are thankful to the editor and anonymous referees for their valuable and constructive comments/suggestions, which greatly improved the presentation and quality of the paper.

Funding

Open access funding provided by Jönköping University.

Author information

Authors and Affiliations

Department of Economics, Finance and Statistics, Jönköping International Business School, Jönköping University, P.O. Box, 1026,5511, Jönköping, Sweden
Talha Omer, Kristofer Månsson & Pär Sjölander
Department of Mathematics and Statistics, Florida International University, Miami, FL, USA
B. M. Golam Kibria

Authors

Talha Omer
View author publications
You can also search for this author in PubMed Google Scholar
Kristofer Månsson
View author publications
You can also search for this author in PubMed Google Scholar
Pär Sjölander
View author publications
You can also search for this author in PubMed Google Scholar
B. M. Golam Kibria
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Talha Omer or Kristofer Månsson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 178 kb)

Appendices

Appendix 1

In connection to the derivation of Eq. (17) of the main draft:

The reshaped ${SLS}_{1}$ and proposed ${SLS}_{2}$ estimators drafted from the main document and equations are given as

$$\widetilde{\beta }\left(\lambda \right)={\left(P\Lambda {P}{\prime}+\lambda G\right)}^{-1}r,$$

${\widetilde{\beta }}_{q}\left(\lambda \right)=q{\left(P\Lambda {P}{\prime}+\lambda G\right)}^{-1}r$.

The MSEs of ${SLS}_{1}$ and the proposed ${SLS}_{2}$ are given in Eqs. (14) and (15) of the main draft. Hoerl and Kinnard (1970a) suggested minimizing $trace\left[MSE (\widetilde{\beta }\left(\lambda \right))\right]$ to obtain the $\lambda$ value, where $\widetilde{\beta }\left(\lambda \right)$ is the ${SLS}_{1}$ estimator corresponding to the canonical model.

The corresponding MSE of the proposed estimator is given in Eq. (15) in the main draft and is rewritten as

$$MSE\left({\widehat{\alpha }}_{q}\left(\lambda \right)\right)= {q}^{2}{\sigma }^{2}{\left(\Lambda +\lambda G\right)}^{-1}\Lambda {\left(\Lambda +\lambda G\right)}^{-1}+\left[q{\left(\Lambda +\lambda G\right)}^{-1}\Lambda -G\right]\alpha {\alpha }{\prime}{\left[q{\left(\Lambda +\lambda G\right)}^{-1}\Lambda -G\right]}{\prime}.$$

Cyclic trace properties can be used to compute these expressions. The trace of a square matrix is defined as the sum of elements of the diagonal. The trace matrix is also denoted as the sum of its eigenvalues (the scalar vector or values remain the same).

$$trace\left[MSE (\widetilde{\beta }\left(\lambda \right))\right]=\sum_{i=1}^{p}\frac{{q}^{2}{\sigma }^{2}{k}_{i}}{{\left({k}_{i}+\lambda \right)}^{-2}}+\sum_{i=1}^{p}\frac{{\alpha }_{i}^{2}(q{k}_{i}-{k}_{i}-\lambda )}{{\left({k}_{i}+\lambda \right)}^{-2}},$$

where we define $\Lambda =diag({k}_{1,}\dots .,{k}_{p})$ and trace is the sum of the eigenvalues. After necessary calculation with the least common factor (LSM), we can obtain Eq. 17.

$$trace\left[MSE (\widetilde{\beta }\left(\lambda \right))\right]=\sum_{i=1}^{p}\frac{{q}^{2}{\sigma }^{2}{k}_{i}+{\alpha }_{i}^{2}{(q{k}_{i}-{k}_{i}-\lambda )}^{2}}{{({k}_{i}+\lambda )}^{2}}=f\left(\lambda ,q\right).$$

Appendix 2

See Fig. 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Omer, T., Månsson, K., Sjölander, P. et al. Improved Breitung and Roling estimator for mixed-frequency models with application to forecasting inflation rates. Stat Papers (2024). https://doi.org/10.1007/s00362-023-01520-2

Download citation

Received: 09 September 2022
Revised: 04 August 2023
Accepted: 15 November 2023
Published: 04 January 2024
DOI: https://doi.org/10.1007/s00362-023-01520-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improved Breitung and Roling estimator for mixed-frequency models with application to forecasting inflation rates

Abstract

Similar content being viewed by others

Macroeconomic and credit forecasts during the Greek crisis using Bayesian VARs

Semi-parametric estimation and forecasting for exogenous log-GARCH models

Re-examining the impact of oil prices on stock returns in the presence of time-varying volatility

1 Introduction

2 Statistical methodology

2.1 Estimation procedure and the proposed estimator for mixed frequency models

2.2 Matrix mean squared error comparison

Corollary 2.1

Corollary 2.2

Theorem 2.1: Necessary and sufficient conditions for the estimator

Proof

Proof

Proof

2.3 Estimation of the smoothing parameter \({\varvec{\lambda}}\) and the predicting parameter q

3 Simulation and estimation performance

3.1 Design of the experiment

3.2 Results and discussion

4 Empirical application

4.1 The performance measures

4.2 In-sample estimation and forecasting of the inflation rate

5 Summary and concluding remark

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (PDF 178 kb)

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation