1 Introduction

In a time-series regression model, economic variables are often available at different time frequencies. However, conventional regression analysis is constrained to using variables that are accessible at the same frequency, typically in an aggregated form. This limitation disregards valuable information from higher frequency variables solely due to technical constraints. The significant scientific drawback of disregarding high-frequency information in the data by always aggregating to a lower mutual frequency is widely acknowledged. However, it wasn't until Ghysels et al. (2004, 2005, 2006) published their influential series of papers that these models gained substantial traction within the research community. Ghysels et al. (2004) introduced the term “mixed-data sampling” (MIDAS) to describe this filtering method, which enables the incorporation of multiple frequencies in prediction models, particularly when the variable with the lowest frequency is included on the left-hand side of the autoregressive distributed lag (ADL) regression equation. The general idea of a MIDAS regression is that it allows for prediction of a low frequency regressand as a function of regressors sampled at higher and/or the same frequencies. We may incorporate more than one high-frequency regressor and other regressors that are measured at the low frequency of the regressand, alongside lagged values of the regressand. Thus, the standard MIDAS regression is a general ADL model with a mix of low- and high-frequency variables to forecast the low-frequency regressand. Clearly, if the regressand is measured at a quarterly frequency, regressors based on monthly or daily frequencies consist of more updated and relevant information than the corresponding quarterly sampled regressors or monthly data aggregated to quarterly frequencies. The relevance of MIDAS-type regression is obvious since many macro variables, such as GDP, unemployment, and inflation, are not always updated at high frequencies, while energy data, such as oil prices, are updated frequently. Furthermore, some energy data, such as carbon dioxide emissions, are updated at a lower frequency than, for example, many macro indicators. Therefore, it is intuitive that additional high-frequency information will enhance the predictive performance of most forecasting models in energy economics. This advantage can also be seen in recent research by, for example, Valadkhani and Smyth (2017), Pan et al. (2018), Zhang and Wang (2019), and Wang et al. (2020), where energy data is used to forecast macro variables, or vice versa, depending on the frequency at which different variables are collected.

Traditional MIDAS regression models are usually based on various forms of ADL models, but there is comparatively rapid development in this vibrant emerging research field. Ghysels and others are inspired by Almon (1965), who utilized the Weierstrass approximation theorem, which states that every continuous function defined on a closed interval [a, b] can be uniformly, and closely approximated by a polynomial function of finite order. This is known as the Almon polynomial distributed lag (PDL) weighting, which imposes restrictions on lag coefficients in autoregressive models that can be used for mixed frequency weighting. However, the theorem does not state which order of the polynomial to use, which results in a model-selection problem and, in turn, potential misspecification issues. The advantage is, however, that the polynomial will result in fewer estimated parameters, usually a polynomial of order 2, 3, or possibly 4. Thus, the number of estimated coefficients depends on the polynomial order and not on the number of high-frequency lags. Therefore, when utilizing the parsimonious Almon estimator, fewer parameters need to be estimated. Since the seminal paper by Ghysels et al. (2004), there has been a substantial and growing literature on various MIDAS regression approaches considering different types of conditions. See, among others, Andreou et al. (2010, 2011, 2013), Clements and Galvão (2008, 2009), Ghysels et al. (2005, 2006, 2007), Ghysels and Wright (2009), Monteforte and Moretti, (2013), Kvedaras and Račkauskas (2010), Modugno, (2013), Penev et al. (2014) and Pan et al. (2018). When the frequency mismatch is small Foroni et al. (2015) presented the methodology of the unrestricted MIDAS (U-MIDAS) which is based on simple OLS.

To decrease the number of estimated coefficients, these methods are generally based on different parametric estimators, which require the parameters to follow a certain pattern, such as that produced by Almon lags. Breitung and Roling (2015) argue that the requirement for the parameter to follow a certain pattern is a major shortcoming of these parametric MIDAS methods. A common empirical situation when this does not hold is when the short-term effect of the high-frequency variable is larger than the long-term effect, which leads to a mixture of positive and negative coefficients. Therefore, Breitung and Roling (2015) suggest a new nonparametric MIDAS estimator (\({SLS}_{1}\)) for which it is not necessary to impose a certain pattern to reduce the number of estimated coefficients. This shrinkage method of Breitung and Roling (2015) has a ridge interpretation and outperforms the parametric MIDAS, especially in the presence of sign-changing parameters.

Compared to ordinary least squares (OLS) estimators, a negative property of shrinkage estimators such as \({SLS}_{1}\) is that they decrease the goodness-of-fit, measured as the coefficient of multiple determination of the regression model. Our suggested approach based on Lipovetsky and Conklin (2005) increases the out-of-sample properties by increasing the in-sample properties.

The purpose of this paper is to improve the Breitung and Roling (2015) estimator, \({SLS}_{1}\), by introducing a two-parameter estimator, \({SLS}_{2}\), based on the work by Lipovetsky and Conklin (2005). This estimator improves the goodness-of-fit of the regression model (see, Toker 2020). Furthermore, the increase in the goodness-of-fit of the regression model by introducing the second parameter also decreases the estimation error since the mean squared error (MSE) of the estimated parameter vector decreases. This final point is important in forecasting since the error can be decomposed into an estimation error and a forecasting error (see Breitung and Roling 2015). Therefore, our new estimator has the potential to improve the forecasts by decreasing the estimation error. Hence, our new approach improves the out-of-sample performance by increasing the goodness-of-fit. We state the conditions under which our newly proposed \({SLS}_{2}\) estimator is superior to the \({SLS}_{1}\) estimator in terms of the MSE. Furthermore, we investigate the small sample properties using a simulation study. Finally, by forecasting the inflation rate based on crude oil returns, we empirically demonstrate that this decreased estimation error can lead to improved out-of-sample forecasts. In this example, we forecast inflation rates using a simple AR model, MIDAS using Almon lags, the \({SLS}_{1}\) and the proposed \({SLS}_{2}\) estimators, and we show that our new proposed estimator minimizes the forecasting error.

2 Statistical methodology

2.1 Estimation procedure and the proposed estimator for mixed frequency models

Considered the regression model

$${y}_{t+h}={\alpha }_{0}+\sum_{j=0}^{p}{\beta }_{j}{x}_{t,j}+{\varepsilon }_{t+h}$$
(1)

which combines a low-frequency variable \({y}_{t}\) and a high-frequency variable denoted \({x}_{t,j}\). The time index \(t=1,\dots ,T\) is for the dependent variable measured at a lower frequency, while \(j\) represents the intraperiod frequencies of the independent variable. To simplify, we use the same notation as Breitung and Roling (2015). For convenience, we assume that the \(j\) index runs in a reverse direction for instance, the index \(t\) paired with \({n}_{t}\) shows the first observation, and pair \(t\) and 0 represents the final observation. Furthermore, we consider only one independent variable, which enables us to write the MIDAS regression as the linear regression model shown in Eq. (1). The error term is denoted \({\varepsilon }_{t+h}\), and it is assumed to follow the Gauss–Markov assumptions. The lag length \(p\) is assumed to be smaller than the minimum number of intraperiod independent variable observations. It is possible to rewrite Eq. (1) using the following matrix notation for simplicity, and we ignore the intercept term \({\alpha }_{0}\):

$${{\text{y}}}^{{\text{h}}}=X\beta +\epsilon ,$$
(2)

where \({{\text{y}}}^{{\text{h}}}={\left[{{\text{y}}}_{1+{\text{h}}},{{\text{y}}}_{2+{\text{h}}},\dots ,{{\text{y}}}_{{\text{T}}+{\text{h}}}\right]}{\prime}\), \({\text{X}}={\left[{{\text{x}}}_{1},\dots ,{{\text{x}}}_{{\text{T}}}\right]}{\prime}\) and \({{\text{x}}}_{{\text{t}}}{=\left({{\text{x}}}_{{\text{t}},0},{{\text{x}}}_{{\text{t}},1},{{\text{x}}}_{{\text{t}},2},\dots ,{{\text{x}}}_{{\text{t}},{\text{p}}}\right)}{\prime}\). Furthermore, \(\beta ={\left({\beta }_{1},{\beta }_{2},\dots , {\beta }_{p}\right)}{\prime}\) and \({{\text{x}}}_{{\text{t}}}\) are regression coefficients and intraperiod high-frequency observations, respectively. \(\epsilon\) is the matrix notation of \({\varepsilon }_{t+h}\), and it is uncorrelated with \({{\text{x}}}_{{\text{t}},0},{{\text{x}}}_{{\text{t}},1},{{\text{x}}}_{{\text{t}},2},\dots ,{{\text{x}}}_{{\text{t}},{\text{p}}}\) and \(\epsilon \sim {\text{IN}}(0,{\upsigma }_{\epsilon }^{2})\). For a thorough exploration of the \(SLS\) estimator and its various interpretations, we would refer to the comprehensive investigation conducted by Breitung and Roling (2015). This study provides insights and detailed information on this subject matter. To estimate the above equation using the SLS estimator, the subsequent penalized least square objective function is:

$$\widetilde{{\text{S}}}\left({\mathrm{\alpha }}_{0},\upbeta \right)={\Vert {{\text{y}}}^{{\text{h}}}-\mathrm{X\beta }\Vert }^{2}+\uplambda \Vert {{(\nabla }^{2}{\upbeta }_{{\text{j}}})}^{2}\Vert ,$$
(3)
$${(\nabla }^{2}{\upbeta }_{{\text{j}}})={\upbeta }_{{\text{j}}}-2{\upbeta }_{{\text{j}}-1}+{\upbeta }_{{\text{j}}-2}, j=2,\dots ,p,$$

where \(\lambda\) is a prespecified smoothing parameter. Notably, when \(\lambda\) is set to zero, the above equation reduces to the least square objective function. Adding a penalization term to the objective function to reduce overparameterization is a classical statistical shrinkage method where the most notable contribution is made by Hoerl and Kennard (1970a,b), who introduced the classical ridge estimator. By solving for \(\upbeta\) when the first-order derivation of Eq. (3) is set to zero \(\left(\partial \widetilde{{\text{S}}}\left(\upbeta \right)/\partial {\upbeta }_{\uplambda }=0\right)\), the resulting estimator from Breitung and Roling (2015) can be written as:

$${\widetilde{\upbeta }}_{\uplambda }={\left({\text{C}}+\mathrm{\lambda G}\right)}^{-1}r,$$
(4)

where the variance of the standardized dependent variable \({{\text{y}}}^{h}\) equals one, \({\text{C}}\) denotes the correlation matrix of regressors, and \({\text{r}}\) denotes the vector of the correlation of the dependent variable with the independent variable.

\({\text{C}}={{\text{X}}}^{\mathrm{^{\prime}}}{\text{X}},\mathrm{ r}={{\text{X}}}^{\mathrm{^{\prime}}}{{\text{y}}}^{h},\mathrm{ G}={{\text{DD}}}^{\mathrm{^{\prime}}}, {{{\text{y}}}^{h}}{\prime}{y}^{h}=1\), where \({\text{D}}\) is a matrix and defined below

$${\text{D}}=\left[\begin{array}{cccccc}1& -2& 1& 0& .& 0\\ 0& 1& -2& 1& .& 0\\ .& .& .& .& .& .\\ 0& .& .& 1& -2& 1\end{array}\right]\left({\text{p}}-1\right)\times \left({\text{p}}+1\right).$$

This estimator is denoted as the \({SLS}_{1}\) estimator in this paper. If \(\uplambda =0\), then this estimator is reduced to \(\widetilde{\upbeta }={\left({\text{C}}\right)}^{-1}{\text{r}}\), that is, the unrestricted MIDAS introduced by Foroni et al. (2015). As discussed in Lipovetsky and Conklin (2005), the objective function in Eq. (3) is minimized in this situation. Therefore, the \({SLS}_{1}\) estimator solves the issue of overparameterization by introducing the biasing parameter \(\uplambda\), but at the same time, it decreases the goodness-of-fit. By assuming independence of the high-frequency regressors with the error term (\({{\text{X}}}^{\mathrm{^{\prime}}}\upvarepsilon =0\)), this may be shown in the MIDAS context using the following identities:

$${{\text{R}}}^{2}+{{\text{S}}}^{2}=1$$
(5)

which can also be written as

$${{\text{R}}}^{2}=1-{{\text{S}}}^{2}={\upbeta }^{\mathrm{^{\prime}}}{\text{r}}={\upbeta }^{\mathrm{^{\prime}}}\mathrm{C\beta }$$
(5a)

and it can be expressed as

$${{\text{R}}}^{2}=1-{{\text{S}}}^{2}={2\upbeta }^{\mathrm{^{\prime}}}{\text{r}}-{\upbeta }^{\mathrm{^{\prime}}}\mathrm{C\beta }={\upbeta }^{\mathrm{^{\prime}}}\left(2{\text{r}}-\mathrm{C\beta }\right),$$
$${{\text{S}}}^{2}=1-{{\text{R}}}^{2}=1-{{\widetilde{{\text{y}}}}^{h}}{\prime}{\widetilde{{\text{y}}}}^{h}={y}^{{h}{\prime}}{y}^{h}-{{\widetilde{{\text{y}}}}^{h}}{\prime}{\widetilde{{\text{y}}}}^{h}={{\widetilde{{\text{y}}}}^{h}}{\prime}\varepsilon .$$
(5b)

Equation (5) exhibits the Pythagorean theorem, with the explained sum of squares \({{\text{R}}}^{2}\) and non-explained \({{\text{S}}}^{2}\) by the regression. The trivial expressions above hold for the U-MIDAS model, but for the \({SLS}_{1}\) estimator, \({{\text{R}}}^{2}\) corresponds to:

$${R}_{{SLS}_{1}}^{2}=2{r}{\prime}{(C+\lambda G)}^{-1}r-{r}{\prime}{\left(C+\lambda G\right)}^{-1}C{(C+\lambda G)}^{-1}r$$
(5c)

which is lower than the \({{\text{R}}}^{2}\) for the U-MIDAS model (see the Supplementary Material for detailed derivations). Hence, the \({SLS}_{1}\) estimator yields a smoothed version of the unrestricted parameter vector, which improves the forecasting properties compared to U-MIDAS. However, it does so by shrinking the unrestricted parameter vector to zero, which decreases the goodness-of-fit. This drawback of shrinkage estimators such as \({SLS}_{1}\) led Lipovetsky and Conklin (2005) to introduce a two-parameter estimator that increases the \({{\text{R}}}^{2}\). Additionally, it decreases the MSE of the estimated parameter vector, which decreases the estimation error (see Sect. 2.2 for detailed derivations). In a forecasting context, this has the potential to decrease the out-of-sample MSE of forecasts since it can be divided into an estimation error and forecasting error. Therefore, in the same manner as Lipovetsky and Conklin (2005), we introduce a two-parameter \({SLS}_{2}\) estimator by generalizing Eq. (3) with two additional terms so that the objective function can be represented as:

$$\widetilde{{\text{S}}}\left({\mathrm{\alpha }}_{0},\upbeta \right)={\Vert {{\text{y}}}^{{\text{h}}}-\mathrm{X\beta }\Vert }^{2}+{\uplambda }_{1}{\Vert {{(\nabla }^{2}{\upbeta }_{{\text{j}}})}^{2}\Vert }^{2}+{\uplambda }_{2}{\Vert {\text{r}}-\upbeta \Vert }^{2}+{\uplambda }_{3}{\Vert {{\text{y}}}^{{{\text{h}}}^{\mathrm{^{\prime}}}}({{\text{y}}}^{{\text{h}}}-\mathrm{X\beta })\Vert }^{2}.$$
(6)

The first item in Eq. (6) represents the least square minimization of residuals, whereas the second item with the parameter \({\uplambda }_{1}\) shrinks the estimates in a similar fashion to \({SLS}_{2}\). The second item with parameter \({\uplambda }_{2}\) corresponds to the minimization of the deviation of estimates \(\upbeta\) from the pair correlation \({\text{r}}\) with the dependent variable. The benefit of including this term is that we end up with an interpretable coefficient that has the same sign as the paired correlation. The third parameter \({\uplambda }_{3}\) minimizes the residual due to the Eq. (5b). The net effect of including one more variable in the model corresponds to \({{\text{R}}}^{2}={\upbeta }^{\mathrm{^{\prime}}}{\text{r}}=\sum_{j=1}^{q}{\upbeta }_{{\text{j}}}{r}_{j}=\sum_{j=1}^{n}{R}_{j}^{2}\), where \({\upbeta }_{{\text{j}}}\) and \({r}_{j}\) are the elements of vectors \(\upbeta ={{\text{C}}}^{-1}r\) which is a standardized coefficient of regression Eq. (1) and \({{\text{X}}}^{\mathrm{^{\prime}}}{{\text{y}}}^{h}\), respectively. Therefore, this parameter corresponds to obtaining a maximum coefficient of multiple determination, which will lead to a better fit. With this solution, we have one more advantage, we performed numerous estimations by various real data sets, running the calculations by the grid of all parameters \({\text{q}}\). In each estimation we evaluated the coefficient of multiple determination defined by its general form given in Eq. (5aa). By minimizing Eq. (6), we obtain the following equation:

$${\widehat{\upbeta }}_{{\text{q}}}\left(\uplambda \right)={\text{q}}{\left({\text{C}}+\mathrm{\lambda G}\right)}^{-1}{\text{r}},$$
(7)

where \(\uplambda\) and \({\text{q}}\) are two parameters that should be estimated based on the data (a detailed description of the estimation is provided in Sect. 3.2). Notably, the two-parameter \({SLS}_{2}\) estimator proposed in Eq. (7) is different from estimator (4) in terms of the second parameter \({\text{q}}\). Thus, the solution of the two-parameter estimator is proportional to the one-parameter estimator given in Eq. (4). This minor change leads to an increased goodness-of-fit. For a fixed value \(\uplambda\), the value of the parameter \({\text{q}}\) can be calculated by considering the criterion of the maximum quality of the regression and that is measured as the coefficient of multiple determination. By substituting Eq. (7) into (5a), we obtain the following expression for the coefficient of multiple determination:

$${{\text{R}}}^{2}=2{{\text{qr}}}^{\mathrm{^{\prime}}}{\left({\text{C}}+\mathrm{\lambda G}\right)}^{-1}{\text{r}}-{{\text{q}}}^{2}{{\text{r}}}^{\mathrm{^{\prime}}}{\left({\text{C}}+\mathrm{\lambda G}\right)}^{-1}{\text{C}}\left({\text{C}}+\mathrm{\lambda G}\right){\text{r}}=2{{\text{qQ}}}_{1}-{{\text{q}}}^{2}{{\text{Q}}}_{2}$$
(8)

which can be solved for q:

$${\text{q}}=\frac{{{\text{Q}}}_{1}}{{{\text{Q}}}_{2}}=\frac{{{\text{r}}}^{\mathrm{^{\prime}}}{({\text{C}}+\mathrm{\lambda G})}^{-1}}{{{\text{r}}}^{\mathrm{^{\prime}}}{({\text{C}}+\mathrm{\lambda G})}^{-1}{\text{C}}{({\text{C}}+\mathrm{\lambda G})}^{-1}}.$$
(9)

Hence, \({{\text{Q}}}_{1}\) and \({{\text{Q}}}_{2}\) stand for two quadratic forms. Equation (9) is the value of \({\text{q}}\), where this function reaches its maximum value and will be dependent on the parameter \(\uplambda\). Substituting \({\text{q}}\) from Eq. (9) into (8) yields \({{\text{R}}}^{2}= {{\text{r}}}^{\mathrm{^{\prime}}}\upbeta = {\upbeta }^{\mathrm{^{\prime}}}\mathrm{C\beta }\). This expression illustrates that \({{\text{R}}}^{2}\) corresponds to the estimator given in Eq. (5a). Hence, in the \({SLS}_{2}\) solution, the smoothing parameter \(\uplambda\) is a regularizing regression model, and the parameter \({\text{q}}\) is used to amend the quality of the fit. In addition, the proposed \({SLS}_{2}\) estimator given in Eq. (7) satisfies the condition of orthogonality and improves the goodness of fit of the regression model by increasing the \({{\text{R}}}^{2}\). The \({SLS}_{2}\) estimator can be interpreted as a two-parameter ridge shrinkage estimator and can also be interpreted in a Bayesian manner (see Lipovetsky and Conklin 2005 for details).

2.2 Matrix mean squared error comparison

The MSE criterion for the parameter vector is important since the forecasting error can be decomposed into an estimation error and a random error term. The estimation error can be decreased using an improved estimation technique that decreases the MSE, such as the \({SLS}_{2}\) estimator. Therefore, we compare the proposed \({SLS}_{2}\) estimator with the previously mentioned \({SLS}_{1}\) estimator under the MSE criterion. The general expression for the MSE of an estimator \(\widetilde{\beta }\) of the true parameter vector \(\beta\) is:

$$MSE\left( \widetilde{\beta } \right)=E\left[\left(\widetilde{\beta }-\beta \right){(\widetilde{\beta }-\beta )}{\prime}\right]=Var\left(\widetilde{\beta }\right)+Bias\left(\widetilde{\beta }\right){Bias(\widetilde{\beta })}{\prime},$$

where \(Var\left(\widetilde{\beta }\right)= E\left[\left(\widetilde{\beta }-\beta \right){(\widetilde{\beta }-\beta )}{\prime}\right]\) and \(Bias\left(\widetilde{\beta }\right)=E\left(\widetilde{\beta }\right)- \beta\). In this section, we compare the theoretical performance of \({SLS}_{1}\) with that of our proposed estimator \({SLS}_{2}\). By using the spectral decomposition of \(C=P\Lambda P\), where \(\Lambda\) is a diagonal matrix whose diagonal elements are eigenvalues of the \({X}{\prime}X\) matrix and \(P\) is the \(p\times p\) matrix whose elements are the eigenvectors of the \({X}{\prime}X\) matrix, the OLS, \({SLS}_{1}\) and \({SLS}_{2}\) estimators can be reshaped as

$$\widehat{\beta }=\left(P{\Lambda }^{-1}{P}{\prime}\right)r,$$
(10)
$$\widetilde{\beta }\left(\lambda \right)={\left(P\Lambda {P}{\prime}+\lambda G\right)}^{-1}r,$$
(11)
$${\widetilde{\beta }}_{q}\left(\lambda \right)=q{\left(P\Lambda {P}{\prime}+\lambda G\right)}^{-1}r.$$
(12)

The corresponding MSEs of these estimators are respectively,

$$MSE\left(\widehat{\beta }\right)={\sigma }^{2} P{\Lambda }^{-1}{P}{\prime},$$
(13)
$$MSE\left(\widetilde{\beta }\left(\lambda \right)\right)={\sigma }^{2} {G}_{\lambda }+{\left[Bias\left({\widetilde{\beta }}_{\lambda }\right)\right]}{\prime}\left[Bias\left({\widetilde{\beta }}_{\lambda }\right)\right],$$
(14)
$$MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)={q}^{2}{\sigma }^{2} {G}_{\lambda }+{\left[Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)\right]}{\prime}\left[Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)\right],$$
(15)

where \({G}_{\lambda }={(P\Lambda {P}{\prime}+\lambda G)}{\prime}P\Lambda {{\text{P}}}{\prime}{(P\Lambda {{\text{P}}}{\prime}+\lambda G)}^{-1}\), \(Bias \left(\widetilde{\beta }\left(\lambda \right)\right)= -\lambda (P\Lambda {P}{\prime}+\lambda G)\beta\) and \(Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)=\left[q{(P\Lambda {{\text{P}}}{\prime}+\lambda G)}^{-1}P\Lambda {{\text{P}}}{\prime}-G\right] \beta .\)

A comparison of the MSEs can be conducted using the quality of fit \(q>1\), which is attained by maximizing the multiple correlation coefficient (see Lipovetsky 2006). Furthermore, the following corollaries of Farebrother (1976) and Trenkler and Toutenburg (1990) are used to compare them. Then, Theorem 2.1 provides the necessary and sufficient condition to compare the MSE of unrestricted MIDAS and \({SLS}_{2}\) for a fixed value of \(q\). The empirical relevance is that if \(q\) is greater than 1, then \(MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)\) will be lower than that for \(MSE\left(\widehat{\beta }(\lambda )\right)\). This is a natural result since when q < 1, the parameters of \({SLS}_{2}\) are pushed further to zero, which decreases the in-sample fit and the MSE. Therefore, for values of q greater than 1, we can expect a lower MSE, which decreases the estimation error and leads to more efficient forecasting results. Theorems 2.2 and 2.3 are given to show the positive definite matrix condition and variances and MSEs of \({SLS}_{1}\) and \({SLS}_{2}\), respectively. A positive definite matrix can be attained if \(\lambda >0 and q>1\). If \(q=1\), then \({SLS}_{1}\) and \({SLS}_{2}\) become identical. However, in Theorem 2.3, some condition has been given to attain the positive definite matrix of the variance of \({SLS}_{1}\) and \({SLS}_{2}\).

Corollary 2.1

(Farebrother 1976). We denote A as a positive definite matrix and a as a nonzero vector with a positive scalar \(\theta\). Then, the necessary and sufficient condition for \(\theta A-a{a}{\prime}>0\) is that \({a}{\prime}{A}^{-1}a< \theta\).

Corollary 2.2

(Trenkler and Touthnburg 1990). \({\widetilde{\beta }}_{1}\) and \({\widetilde{\beta }}_{2}\) are two estimators of \(\beta\). Let \(H=Var\left({\widetilde{\beta }}_{1}\right)-Var ({\widetilde{\beta }}_{2})\) be a positive definite matrix, \({h}_{1}=Bias ({\widetilde{\beta }}_{1})\) and \({h}_{2}=Bias ({\widetilde{\beta }}_{2})\).

Then, \(MSE \left({\widetilde{\beta }}_{1}\right)-MSE \left({\widetilde{\beta }}_{2}\right)>0 iff {{h}_{2}}{\prime}\left(H+{h}_{1}{{h}_{1}}{\prime}\right){h}_{2}<1.\)

Theorem 2.1: Necessary and sufficient conditions for the estimator

Based on the assumption \(\lambda =0\), and that the value of \(q\) is fixed, this Theorem 2.1 provides the standard to compare the MSE of the unrestricted MIDAS and \({SLS}_{2}\).

Let \(\lambda >{\lambda }_{1j} and {b}_{2}=Bias ({\widetilde{\beta }}_{q}\left(\lambda \right))\). The necessary and sufficient condition for \(MSE\left(\widehat{\beta }(\lambda )\right)-MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)>0\) is \({{b}_{2}}{\prime}{\left[\left({C}^{-1}-{q}^{2}{G}_{\lambda }\right)\right]}^{-1}{b}_{2}<{\sigma }^{2}, where {\lambda }_{1j}={k}_{j}\left(q-1\right), j=1,\dots ,p.\)

Proof

The difference between the matrix of the MSE of the estimator’s Eqs. (10) and (12) is

$$MSE\left(\widehat{\beta }(\lambda )\right)-MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)= {\sigma }^{2}{\left({C}^{-1}-{q}^{2}{G}_{\lambda }\right)}^{-1}{-b}_{2}{{b}_{2}}{\prime}={\sigma }^{2}\left[P diag{\left\{\frac{1}{{k}_{j}}-\frac{{q}^{2}{k}_{j}}{{\left({k}_{j}+\lambda \right)}^{2}}\right\}}_{j=1}^{p}{P}{\prime}\right]{-b}_{2}{{b}_{2}}{\prime},$$

where \(\left({C}^{-1}-{q}^{2}{G}_{\lambda }\right)\) is positive definite if \({\lambda }^{2}+2{k}_{j}\lambda +\left(1-{q}^{2}\right){{k}_{j}}^{2}>0. { Now, \lambda }^{2}+2{k}_{j}\lambda +\left(1-{q}^{2}\right){{k}_{j}}^{2}\) is positive, when \(\lambda > {\lambda }_{1j}={k}_{j}(q-1)\). By Corollary 2.1, the proof of the theorem is completed.

Theorem 2.2: Positive definite matrix condition

Let \({b}_{1}=Bias\left({\widetilde{\beta }}_{\lambda }\right) and {b}_{2}= Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right) if {{b}_{1}}{\prime}{G}_{\lambda }{b}_{1}< {\sigma }^{2}\left({q}^{2}-1\right)\) then \(MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-MSE\left(\widetilde{\beta }\left(\lambda \right)\right)>0\).

Proof

\(MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-MSE\left(\widetilde{\beta }\left(\lambda \right)\right)=Var\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-Var\left(\widetilde{\beta }\left(\lambda \right)\right)+{b}_{2}{{b}_{2}}{\prime}-{b}_{1}{{b}_{1}}{\prime}={\sigma }^{2} \left({q}^{2}-1\right){G}_{\lambda }+{b}_{2}{{b}_{2}}{\prime}-{b}_{1}{{b}_{1}}{\prime}.\)

\({\sigma }^{2} \left({q}^{2}-1\right){G}_{\lambda }\) is always a positive definite matrix for \(\lambda >0 and q>1\). Furthermore, \({b}_{2}{{b}_{2}}{\prime}\) is also a positive definite matrix. Therefore, the problem of whether \(MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-MSE\left(\widetilde{\beta }\left(\lambda \right)\right)\) is a positive definite matrix reduces to that of determining whether \({\sigma }^{2} \left({q}^{2}-1\right){G}_{\lambda }-{b}_{1}{{b}_{1}}{\prime}\) is a positive definite matrix. Accordingly, by using Corollary 1, we have that \({\sigma }^{2} \left({q}^{2}-1\right){G}_{\lambda }-{b}_{1}{{b}_{1}}{\prime}\) is a positive definite matrix; thus, the proof is completed.

Theorem 2.3: Comparison of the two estimators’ sampling variances

Let \({b}_{1}=Bias\left({\widetilde{\beta }}_{\lambda }\right) and {b}_{2}= Bias\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)\).

$$MSE\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)-MSE\left(\widetilde{\beta }\left(\lambda \right)\right)>0 iff {{b}_{1}}{\prime}{\left({\sigma }^{2}\left({q}^{2}-1\right){G}_{\lambda }+{b}_{2}{{b}_{2}}{\prime}\right)}^{-1}{b}_{1}<1.$$

Proof

For \(\lambda >0\) and \(q>1, Var\left({\widetilde{\beta }}_{q}\left(\lambda \right)\right)- Var\left(\widetilde{\beta }\left(\lambda \right)\right)= {\sigma }^{2}({q}^{2}-1){G}_{\lambda }\), which is a positive definite matrix. Therefore, using Corollary 2.2, the proof is accomplished.

2.3 Estimation of the smoothing parameter \({\varvec{\lambda}}\) and the predicting parameter q

In this section, we demonstrate how to estimate the smoothing parameter \(\lambda\) and the prediction of parameter \(q\) for the \({SLS}_{2}\) estimator. A potential approach for \(\lambda\) is cross-validation by fixing the \(q\) value. Another commonly used method to choose the smoothing parameter \(\lambda\) by minimizing the AIC (Breitung and Roling 2015). In this article, we rely on the Breitung and Roling (2015) method by first minimizing the AIC. Then we improve the fit of the regression model using the additional parameter \(q\). We observe more reliable results and good fit of regression model with the additional parameter \(q\). Additionally, we will outline the criteria used for selecting the smoothing parameter. We define matrix \(\Lambda =diag({k}_{1,}\dots .,{k}_{p})\) as a diagonal matrix whose diagonal elements are the eigenvalues of \(X{X}{\prime}\), whereas \(\Upsilon\) is a \(p\times p\) matrix whose elements are the eigenvectors of matrix \(X{X}{\prime}\) fulfilling \({\Upsilon }{\prime}X{X}{\prime}\Upsilon =\Lambda , {\Upsilon }{\prime}\Upsilon =G\). Subsequently, the original model can be written in canonical form,

$${y}^{h}=Z\alpha +{\varepsilon }_{t+h},$$
(16)

where \(Z=X\Upsilon , \alpha ={\Upsilon }{\prime}\beta\), and \({Z}{\prime}Z={\Upsilon }{\prime}X{X}{\prime}\Upsilon =\Lambda\). Then, \({\widehat{\alpha }}_{q}\left(\lambda \right)={\Upsilon }{\prime}{\widetilde{\beta }}_{q}\left(\lambda \right)\), and \(MSE\left({\widehat{\alpha }}_{q}\left(\lambda \right)\right)={\Upsilon }{\prime}MSE({\widetilde{\beta }}_{q}(\lambda ))\Upsilon\). Therefore, the MSE of estimator \({\widehat{\alpha }}_{q}\left(\lambda \right)\) can be written as

$$MSE\left({\widehat{\alpha }}_{q}\left(\lambda \right)\right)= {q}^{2}{\sigma }^{2}{\left(\Lambda +\lambda G\right)}^{-1}\Lambda {\left(\Lambda +\lambda G\right)}^{-1}+\left[q{\left(\Lambda +\lambda G\right)}^{-1}\Lambda -G\right]\alpha {\alpha }{\prime}{\left[q{\left(\Lambda +\lambda G\right)}^{-1}\Lambda -G\right]}{\prime}.$$

The next target is then to obtain the optimal values of the parameters \(\lambda\) and \(q\) by minimizing the given equations:

$$f\left(\lambda ,q\right)=trace\left(MSE\left({\widehat{\alpha }}_{q}\left(\lambda \right)\right)\right)=\sum_{i=1}^{p}\frac{{q}^{2}{\sigma }^{2}{k}_{i}+{\alpha }_{i}^{2}{(q{k}_{i}-{k}_{i}-\lambda )}^{2}}{{({k}_{i}+\lambda )}^{2}}.$$
(17)

Here, \(f\left(\lambda ,q\right)\) is a quadratic function of the parameter \(q\), so the value of \(q\) can be derived by minimizing \(f\left(\lambda ,q\right)\) for a fixed \(\lambda\) with respect to \(q\) and by setting them equal to zero. After the parameters \({\sigma }^{2}\) and \({\alpha }_{i}^{2}\) are replaced by their corresponding unbiased estimators \({\widehat{\sigma }}^{2}\) and \({\widehat{\alpha }}_{i}^{2}\), we obtain the optimal estimator of \(q\) for fixed values of \(\lambda\) as

$$\frac{\partial f\left(\lambda ,q\right)}{\partial q}=\sum_{i=1}^{p}\frac{\left[2q{\sigma }^{2}{k}_{i}+2{\alpha }_{i}^{2}{k}_{i}\left(q{k}_{i}-{k}_{i}-\lambda \right)\right]{\left({k}_{i}+\lambda \right)}^{2}}{{\left({k}_{i}+\lambda \right)}^{4}},$$
$${\widehat{q}}_{opt}=\frac{\frac{\sum_{i=1}^{p}{\widehat{\alpha }}_{i}^{2}{k}_{i}}{{k}_{i}+\lambda }}{\sum_{i=1}^{p}\frac{{{\widehat{\sigma }}^{2}k}_{i}+{\widehat{\alpha }}_{i}^{2}{k}_{i}}{{{(k}_{i}+\lambda )}^{2}}}.$$
(18)

Thus, we may derive the value of \(\lambda\) that minimizes \(f(\lambda ,q)\) with respect to \(\lambda\) and set the equation equal to zero, which is accomplished below. We can find the optimal value by setting the unbiased estimate of \({\alpha }_{i}^{2}\) and \({\sigma }^{2}\) by fixing the \(q\) value. The unbiased estimates of the \({\alpha }_{i}^{2}\) and \({\sigma }^{2}\) can be achieved by initiating the iterative procedure which is explained in detail in Sect. 7 of Hoerl and Kennard (1970). The optimal value of \(\lambda\) corresponds to:

$${\widehat{\lambda }}_{opt}=\frac{q\sum_{i=1}^{p}{\widehat{\sigma }}^{2}{k}_{i}+(q-1)\sum_{i=1}^{p}{{\widehat{\alpha }}_{i}}^{2}{k}_{i}}{\sum_{i=1}^{p}{{\widehat{\alpha }}_{i}}^{2}{k}_{i}}.$$
(19)

In practice, both parameters \(\lambda\) and q depend on each other, so, as a first step, we estimate the optimal value of \(\lambda\) by minimizing the Akaike information criterion (AIC), as suggested by Breitung and Roling (2015). Subsequently, we calculate the optimal value of q based on the \(\lambda\) estimated value.

3 Simulation and estimation performance

3.1 Design of the experiment

In this section, we compare the small-sample properties of our nonparametric variant of the \({SLS}_{1}\) estimator and the two-parameter \({SLS}_{2}\) estimators of the MIDAS regression. For a comparative study, following Andreou et al. (2010), we generate data from the following equation:

$${y}_{t+h}={\beta }_{0}+\sum_{j=0}^{p}{\beta }_{j}{x}_{t,j}+{\varepsilon }_{t+h},$$
(20)
$${\beta }_{j}={\alpha }_{1}{\omega }_{j}\left(\theta \right),$$
(21)
$${\varepsilon }_{t+h}\sim {{\text{N}}}_{iid}\left(\mathrm{0,0.125}\right),$$
(22)

where \(t=\mathrm{1,2},\dots ,T\) and \({\beta }_{0}=0.5, {\text{and}} {\omega }_{j}\left(.\right)\) is a weighting function that can be chosen in several specifications presented below. The high-frequency regressor is generated by the \(AR(1)\) process given below:

$${x}_{t,j}={\alpha }_{0}+\varrho {x}_{t,j-1}+{\varepsilon }_{j,t}, {\varepsilon }_{j,t}\sim iid N\left(\mathrm{0,1}\right),$$
(23)

where \(j=\mathrm{0,1},\dots .,p\), \({x}_{t,j-p-k}={x}_{t-1,j-k}\) for all \(k>0\). Correspondingly, \({x}_{t,j}\) denotes the \({j}_{th}\) lag of the \(AR\left(1\right)\) series \({x}_{t,0}\). As suggested by Andreou et al. (2010), we use \({\alpha }_{0}=0.5\) and \(\varrho =0.9\). The MSE of the proposed estimator can be obtained as \(E\left[{\left({\widetilde{\beta }}_{q\left(\lambda \right)}-\beta \right)}{\prime}\left({\widetilde{\beta }}_{q\left(\lambda \right)}-\beta \right)\right]=q\left(\left({\beta }{\prime}{{\Psi }_{T}}{\prime}\left(\lambda \right){\Psi }_{T}\left(\lambda \right)\beta \right)+{{\sigma }^{2}}_{u}trace\left[{\left({X}{\prime}X+\lambda G\right){X}{\prime}X\left({X}{\prime}X+\lambda G\right)}^{-1}\right]\right)\), where \({\Psi }_{T}\left(\lambda \right)= \lambda {\left({X}{\prime}X+\lambda G\right)}^{-1}G.\) The chosen sample size for the model, (20)-(22), is \(T \sim \left\{100, 200, 400\right\}\) and the considered number of high-frequency lags is \(\left(p+1\right)\sim \left\{20, 40, 60\right\}\). The scale parameter given in Eq. (21) is chosen as \({\alpha }_{1}\sim (0.2, 0.3, 0.4)\) to model small, medium, and large signal-to-noise ratios. However, the applied \({R}^{2}\) responds slightly differently due to different sample sizes, number of high-frequency lags and weighting functions. These signal-to-noise ratios approximately satisfy \(0.40\le {R}^{2}<0.50, 0.50\le {R}^{2}\le 0.70 and {R}^{2}>0.70\). The number of Monte-Carlo replications is 5000.

The analysis using Monte-Carlo simulations considers a lag distribution that fits both the nonparametric and parametric MIDAS frameworks (Breitung and Roling 2015). Three different weighting functions are considered and presented in the equation format given below. The weighting function generated for this exercise is meant to cover different types of behavior of the high-frequency weights, including swift decay where many weights are approximately equal to zero after cutting off some lag and slow decay. The initial experiment was conducted by applying the exponential weight (slow decay) function given as

$${\omega }_{j}\left(\theta \right)=\frac{{\text{exp}}({\theta }_{1}j+{\theta }_{2}{j}^{2})}{\sum_{i=0}^{p}{\text{exp}}({\theta }_{1}i+{\theta }_{2}{i}^{2})}, j=\mathrm{0,1}\dots p.$$

We follow Andreou et al. (2010) and set \({\theta }_{1}=7\times {10}^{-4}\) and \({\theta }_{2}=-6\times {10}^{-3}\). The second experiment employs a hump-shaped weighting function.

$${\omega }_{j}\left(\theta \right)=\frac{{\text{exp}}({\theta }_{1}j+{\theta }_{2}{j}^{2})}{\sum_{i=0}^{p}{\text{exp}}({\theta }_{1}i+{\theta }_{2}{i}^{2})}, j=\mathrm{0,1}\dots p.$$

We govern the parameter such that the weighting function reaches a maximum at \(j=6, j=10, j=16\) when 20, 40 and 60 lags are retained. We choose \({\theta }_{1}=8\times {10}^{-2} and {\theta }_{2}=\frac{{\theta }_{1}}{10},{\theta }_{2}=\frac{{\theta }_{1}}{20},and {\theta }_{2}=\frac{{\theta }_{1}}{30}\). The third experiment is run by utilizing the sign-changing weighting function given as

$${\omega }_{j}\left({c}_{1},{c}_{2}\right)=\frac{{c}_{1}}{p+1}\left[{\text{sin}}\left({c}_{2}+\frac{j2\pi }{p}\right)\right],$$

where \({c}_{2}=1\times {10}^{-2},{c}_{1}=5, {c}_{1}=2.5, and {c}_{1}= 5/3\) for lags 20, 40, and 60, ensuring that these weights sum to one. The simulation results are given in Table 1.

Table 1 In-sample RMSE ratios

3.2 Results and discussion

In Table 1, we present a comparison in terms of the root mean squared error (RMSE) ratios between the \({SLS}_{2}\) and \({SLS}_{1}\) estimators. Since Breitung and Roling (2015) demonstrate that the \({SLS}_{1}\) nonparametric approach is superior to the usual parametric approach, we compare only the \({SLS}_{1}\) to the improved two-parameter \({SLS}_{2}\) estimator. The results in Table 1 are summarized as follows. First, the proposed nonparametric \({SLS}_{2}\) estimator with an additional parameter \(q\) dominates the \({SLS}_{1}\) nonparametric approach for in-sample estimation in almost all situations considered. This improvement is often very substantial, with a relative RMSE close to zero. Clearly, the proposed \({SLS}_{2}\) estimator outperforms the \({SLS}_{1}\) estimator for in-sample estimation for hump-shaped weighting functions. For exponentially declining weights and hump-shaped weights, our new method dominates the \({SLS}_{1}\) one-parameter estimator suggested by Breitung and Roling (2015) in all situations. For sign-changing weights, the proposed two-parameter \({SLS}_{2}\) estimator performs equally decent, except when \(p+1=20\). However, our results yield lower RMSE than the estimator suggested by Breitung and Rolling (2015) in most situations. However, when the lag length equals 20 and \({\alpha }_{1}=0.3\) or \({\alpha }_{1}=0.4,\) the \({SLS}_{1}\) estimator is more efficient in terms of RMSE for almost all situations considered. Moreover, Table 1 indicates that as the sample size increases, the \({\widehat{q}}_{opt}\) value decreases. We believe, based on these simulation results, that for large sample sizes, \({\widehat{q}}_{opt}\) will be close to one, which indicates a convergence between the \({SLS}_{1}\) and \({SLS}_{2}\) estimators. Additionally, we present a graphical representation of the RMSE values provided in Table 1 in Appendix 2 to enhance clarity and understanding of the results.

4 Empirical application

4.1 The performance measures

In this section we introduced real life empirical application to observe the performance of our proposed method. In this example, we investigate the in-sample properties and the out-of-sample forecasts of the two different estimation methods. A clear link exists between the in-sample and out-of-sample properties of estimators, as described by Breitung and Roling (2015). The MSE of the out-of-sample forecast may be decomposed in the following way:

$${y}_{t+h}-{\widehat{y}}_{\left.t+h\right|t}=\left(E\left({y}_{t+h}\right)-{\widehat{y}}_{\left.t+h\right|t}\right)+{u}_{t+h},$$

where \(E\left({y}_{t+h}\right)-{\widehat{y}}_{\left.t+h\right|t}\) corresponds to the estimation error and \({u}_{t+h}\) to the error term. Our methods, as shown in the mean square error comparison in Sect. 2.2 and the results of the simulations in Sect. 3.2, decrease the estimation error and therefore have the potential to improve the out-of-sample forecasts. To obtain the out-of-sample forecasts, we regress the future values of the dependent variable (mathematically denoted as \({y}_{t+h}\)) on current or past values of the regressor. In the first step, we split the sample into an estimation sample \(t=1,\dots , {T}^{e}\) and a forecasting sample \(t={T}^{e}+1,\dots ,T,\) where \({n}_{f}=T-{T}^{e}\) denotes the number of forecasted values. Given the estimated lag distribution from the estimation sample, one-step-ahead forecasts are constructed. The first observation from the estimation sample is dropped, while the end of the sample is extended to include the observation in the next period. We assess forecasts according to the root mean squared forecast error:

$$RMSE= \sqrt{\frac{1}{nf}\sum_{t={T}^{e+1}}^{{T}^{e}+nf}{\left({y}_{t+h}-{\widehat{y}}_{t+h\vdots t}\right)}^{2}.}$$

4.2 In-sample estimation and forecasting of the inflation rate

Frequent forecasts of the inflation rate have become increasingly important in economic decision making. The empirical importance for policymakers stems from the issue that many contracts are set in nominal terms while real prices are of practical relevance. An important indicator of the inflation rate is oil returns. In Breitung et al. (2013), the predictive power of 15 different daily indicators for the German inflation rate was analyzed. The results of the different MIDAS regressions demonstrated that the predictive power of crude oil prices on the monthly inflation rate yielded an \({R}^{2}>0.3\). The other potential indicators exhibited an \({R}^{2}>0.1\). These results are in line with those in Bruneau et al. (2007), where energy prices were shown to play a dominant role in inflation forecasting, while other indicators, such as exchange rates, were shown to be poor predictors of inflation changes. Therefore, in this empirical application, we choose to forecast the US inflation rate based on crude oil price data. We use US data from the Federal Reserve Bank of St. Louis. The monthly data start in 1986 and end in 2020. The in-sample period spans from February 1986 to December 2012. We assess the predicted capabilities of different possible MIDAS methods, which includes our proposed methods. We utilize extending window forecast to closely illustrate our simulation results. It should be noted that our methods can be used for other forecasting schemes as well such as rolling window. We perform one-step-ahead out-of-sample forecasts, and the out-of-sample period is from January 2012 to January 2020. The oil price data are based on WTI daily crude oil prices (regressor). The monthly data consists of seasonally adjusted monthly data (regressand). To avoid nonstationarity issues, we transform both variables into first differences of the log, in line with Breitung and Roling (2015); therefore, we forecast the inflation rate. The regressand is shown in Fig. 1, while the regressor is shown in Fig. 2. We observe some peaks, especially during the 1990s, but no major structural changes. The in-sample results are shown in Table 2, where we see a decrease in the RMSE of the new estimator. This decrease is substantial, and the ratio is as small as 0.2172 for p + 1 = 40. Furthermore, as expected, based on the results in Sect. 2, we observe a substantial increase in R2. The results for R2 are in line with those obtained by Breitung et al. (2013) and shows the importance of oil returns as an indicator of the inflation rate. This indicates a considerable improvement regarding the in-sample properties. In Fig. 3, we present graphs of the unrestricted estimated parameters and the \({SLS}_{1}\) and \({SLS}_{2}\) estimators. The SLS-type estimators smooth the erratic behavior of the unrestricted estimator. For p + 1 = 20, there appears to be a linear relationship between the variables, but when the lag length increases to p + 1 = 60, it becomes more hump-shaped. Moreover, \({SLS}_{1}\) shrinks the coefficients to be very close to zero, while \({SLS}_{2}\) estimates a larger impact of the regressor on the inflation rate. In Table 3 we show results for \({SLS}_{1}\) and \({SLS}_{2}\). Further, as a benchmark we forecast using U-MIDAS and the parametric Estimated Almon Lag Polynomial (EALP).

Fig. 1
figure 1

Plot of the monthly inflation rate

Fig. 2
figure 2

Plot of oil returns

Table 2 In-sample estimation of the monthly inflation rate based on oil returns
Fig. 3
figure 3

a–c Estimated lag distribution of the monthly inflation rate with a \(p+1=20\), b \(p+1=40\), and c \(p+1=60\). \({SLS}_{1}\) (straight line), \({SLS}_{2}\) (dashed line), Unrestricted (dashed-dotted line)

Table 3 RMSE of the out-of-sample forecast (window forecasting)

Table 3 displays the performance outcomes of utilizing the extending window technique to assess out-of-sample performance. It is important to highlight that our approach is not limited to the extending window method; other techniques like the rolling window can also be employed. The results presented in Table 3 demonstrate significant enhancement in out-of-sample performance, as confirmed by the statistically significant Diebold and Mariano test. This improvement exists for all lag lengths, but it is the greatest for lag length 40. As shown in the simulation study, a substantial improvement in the in-sample properties exists, with a lower RMSE, which increases the precision of the estimated parameters and decreases the estimation error. In this empirical application, we also note that this improvement holds for real data and that this improvement may lead to a decrease in the out-of-sample forecasting performance.

5 Summary and concluding remark

Combining variables at different frequencies in the same model induces difficulty in estimating the autoregressive lag distribution, where the effective sample size consists of low-frequency variables. Breitung and Roling (2015) propose a smooth least squares approach (\({SLS}_{1})\) for unrestrictive autoregressive coefficients. In contrast to the Almon or Beta lag distribution used in Ghysels et al. (2006), Breitung and Roling’s (2015) \({SLS}_{1}\) approach ensures a flexible but smooth trending lag distribution. However, even if the biasing parameter of \({SLS}_{1}\) solves the overparameterization problem, this approach leads to a decreased goodness-of-fit. Therefore, we use a method based on Lipovetsky and Conklin (2005), where we generalize this shrinkage regression into a two-parameter smoothed least-squares estimator (\({SLS}_{2}\)). Our \({SLS}_{2}\) estimator contains an additional parameter \(q\) that improves the in-sample fit. We also present mathematical derivations of the conditions under which our new estimator improves the MSE properties. The latter property is important in a forecasting context since it leads to a lower estimation error, which decreases the out-of-sample forecasting error. Furthermore, we conduct Monte-Carlo simulations of the in-sample properties, and in an empirical application, we show the potential benefit for out-of-sample forecasts. Our results clearly demonstrate the advantages of our new \({SLS}_{2}\) approach. Thus, compared to Breitung and Roling’s (2015) one-parameter ridge approximation (\({SLS}_{1})\), our two-parameter (\({SLS}_{2}\)) shrinkage approach for mixed-frequency models is superior. It leads to good properties of orthogonality between residuals and the predicted dependent variable, which results in a consistent solution. Furthermore, it improves the in-sample properties and is likely to improve out-of-sample forecasts compared to the \({SLS}_{1}\) estimator. Therefore, our new method can be applied as an alternative to previous approaches in the literature, especially when it is difficult to impose a structural pattern for the parameters. Therefore, in comparison to previous research, we demonstrate the strength and practical relevance of our method in generating precise forecasts/nowcasts for important variables and models in economics.