A Factor Analysis for the Spanish Economy

We present a medium-scale dynamic factor model to estimate and forecast the rate of growth of the Spanish economy in the very short term. The intermediate size of the model overcomes the serious specification problems associated with large-scale models and the implicit loss of information of small-scale models. The estimated common factor is used to forecast the Gross Domestic Product (GDP) by means of a transfer function model. Likewise, the model solves the operational and informational limits posed by the presence of an unbalanced panel of indicators and generates multivariate forecasts of the basic indicators.


INTRODUCTION
Business cycle analysis has been spurred by the severity of the recent downturn of the world economy. The assessment of economic policies does require timely and precise information about general macroeconomic conditions. In this vein, the use of standard measures of aggregate economic activity based on the Quarterly National Accounts (QNA) imposes a delay in the decision-making process that may hamper its effectiveness.
In order to alleviate the information constraints imposed by these standard measures, we design a coincident indicator to estimate the state of the business cycle in the very short term on a real time basis. This attempt has some precedents, starting with Watson (1992, 2002) which may be considered modern descendants of the seminal work on cyclical indicators of Burns and Mitchell (1946). More recently, both central banks and academic institutions have created all sorts of real time indicators and disseminated them through their websites. These estimates and forecasts influence policy-makers and shape up public opinion. Notable examples are the indicator designed by Aruoba et al. (2009), published in real time by the Federal Reserve Bank of Dallas;Chauvet (1996), both for the United States economy (U.S.) and for Brazil; Giannone et al. (2008) also for the U.S. economy; Angelini et al. (2008) for the Eurozone and Pérez-Quirós (2009a, 2009b) for both the Eurozone and the Spanish economy.
All the preceding models are designed either as small-scale or large-scale. Both methodologies present important shortcomings. On the one hand, small-scale models are relatively exposed to idiosyncratic shocks and suffer an implicit loss of information. On the other hand, the estimation of large-scale models by quasi-maximum likelihood methods, akin to those used in our model, can violate the weak cross-correlation assumption needed to ensure the consistency of their estimators. By contrast, our model has an intermediate size that provides a natural hedge against the pitfalls of both small-scale and large-scale models.
The debate concerning the forecasting performance of small-scale models versus large-scale models is still an open issue. Our main contribution to the literature is twofold. First, we increase the number of indicators in a controlled way, fulfilling the assumption of weak cross-correlation among the idiosyncratic components which ensures the consistency of the estimators. Second, our model combines dynamic factor analysis with transfer function modeling, instead of ad hoc bridge equations.
The common factor underlying the observed indicators is estimated by means of the Kalman filter, after a suitable reparameterization of the model in state space form. In this way, we solve simultaneously the problem posed by the presence of an unbalanced panel (i.e., indicators with non-overlapping samples) and the generation of forecasts for individual indicators using a multivariate approach.
It must be emphasized that these predictions of the individual indicators are made in an explicit multivariate setting, avoiding the overparameterization and overfitting risks posed by other approaches (e.g. VAR models). Therefore, when making individual forecasts, the model makes an efficient use of the information contained in related indicators.
Moreover, transfer function models provide a simple and quantitatively consistent relationship between the common factor and the macroeconomic aggregates, GDP in particular. This linkage allows us to compile a contemporaneous estimate of GDP on a real-time basis. These models also provide confidence intervals for the GDP estimates, quantifying the uncertainty that surrounds them.
This two-step approach (common factor estimation and transfer function) effectively disentangles the uncertainty due to the real-time estimation of actual business cycle conditions using monthly indicators from the uncertainty due to the relationship between GDP and monthly short-term indicators. This separation hedges us from idiosyncratic GDP changes that may distort the historical relationship between monthly indicators and quarterly macroeconomic aggregates measured by the QNA. Additionally, the fact that the GDP compilation features 1 (chain-linking, benchmarking, seasonal adjustment and balancing) are so different from the usual short-term indicators compilation practices, suggests the use of a two-step approach such as the one used in this work. This methodology is applied to a broad set of monthly indicators of the Spanish economy, whose selection took into account their economic significance, their temporal and statistical coverage, and an appropriate degree of sources diversification. The size of the model (31 indicators) allows a feasible computerized processing and reduces the risks implied by idiosyncratic shocks affecting the estimation and forecasting of the common factor as well as its link to the quarterly GDP.
The document is organized as follows. The second section outlines the econometric methodology, detailing the nature of the dynamic factor model, its estimation by means of the Kalman filter and its relationship with macroeconomic variables using transfer function models. The third section presents the basic short-term indicators and their preliminary statistical treatment. The empirical results appear in section four. Finally, a set of appendices describes the technical details of the model, in order to ensure the self-contained nature of the text.

ECONOMETRIC APPROACH
The starting point of our modeling approach is a dynamic one-factor model that captures in a parsimonious way the dynamic interactions of a set of monthly economic indicators. The common factor of the system is estimated by means of the Kalman filter, after casting the factor model in state space form. On the basis of this factor we design a synthetic index that is related to quarterly aggregate output through a transfer function model. The entire procedure has been adapted to operate with unbalanced data panels, in order to forecast both indicators as well as macroeconomic aggregates in real time (nowcasting).

Dynamic factor model
Dynamic factor analysis is based on the assumption that a small number of latent variables generate the observed time series trough a stochastically perturbed linear structure. Thus, the pattern of observed co-movements is decomposed into two parts: communality (variation due 1 The concurrent use of these techniques in QNA compilation is an additional source of idiosyncrasy, see Abad et al. (2009). to a small number of common factors) and idiosyncratic effects (specific elements of each series, uncorrelated along the cross-section dimension).
In this paper we assume that the observed, stationary growth signals of k monthly indicators are generated by a factor model: Being: • z i,t : i-th indicator growth signal at time t.
• λ i : i-th indicator loading on common factor.
• f t : common factor at time t.
• u i,t : specific or idiosyncratic component of i-th indicator at time t.
The loadings λ i measure the sensitivity of the growth signal of each indicator for changes in the factor. Equation [2.1] only considers static (i.e., contemporaneous) interactions among the observed indicators trough its common dependence on a latent factor. The model should be expanded in order to adapt it to a time series framework, thereby adding a dynamic specification for the common factor and the idiosyncratic elements.
A fourth-order autoregression, AR(4), provides a sufficiently general representation for the common factor: In [2.2] B is the backward operator Bf t = f t-1 and the variance of the innovation has been normalized. Depending on the characteristic roots of φ 4 (B) the model may exhibit a wide variety of dynamic behaviors.
We consider an AR(1) specification for the dynamics of the specific elements, allowing for some degree of persistence: Finally, we assume that all the innovations of the system are orthogonal: 4] attempts to represent the static as well as the dynamic features of the data. We estimate the common and idiosyncratic factors using the Kalman filter, after a suitable reparameterization of the model in state-space form. This reparameterization requires the introduction of a state vector that encompasses all the required information needed to project future paths of the observed variables from their past realizations. In our case, this vector is: The corresponding measurement equation is: .k} represents the loading matrix. This equation allows us to derive the observed indicators from the (unobservable) state vector.
The transition equation completes the system and characterizes its dynamics: G is a square matrix with dimension k+4: The innovations vector V t is: evolves as a Gaussian white noise with diagonal variance-covariance matrix as follows: We assume that the time index t goes from 1 to T. The application of the Kalman filter requires Θ = [H, G, Q] to be known. Since the model is not small-scale, full-system maximum likelihood estimates for Θ are not feasible. Our solution was to derive them from the static version of the model estimated using bootstrap methods, see Appendix A and B for details.

Dealing with an unbalanced data panel
One of the major operational problems faced while analyzing multiple time series is the incomplete nature of the available information. In general, the availability of different indicators is not homogeneous, which leads to the generation of a non-overlapping sample. One way to deal with unbalanced panels consists in working only with complete panels, in the time dimension or in the cross-section dimension. As shown in Figure 2.3, in the first case we may discard a large number of the relevant indicators, with likely adverse effects on factor estimation accuracy and forecasting performance. In the second case, the number of observations may be too small when some series have a short span, making the forecasting or backasting horizon too long.
Given these drawbacks we propose a way to utilize all available information, both on the crosssection dimension and on the time dimension. The method, which is partially based on Stock and Watson (2002) and Giannone et al. (2006), relies on an iterative process with the following steps: Estimation of a static factor model by principal components using the longitudinal panel data. Obviously, the use of this panel involves a loss of information that will be compensated in the following stages.
II. The indicators that have been excluded from the longitudinal panel are individually regressed (by ordinary least squares, OLS) on the common factor. The estimated parameters are then used to calculate the missing data in these series from t=1 to t=T 1 .

III.
A new factor is calculated from the statically balanced panel, from t=1 to t=T 1 , using the same procedure as in step 1. Hence, new parameters Θ = [H, G, Q] are available. IV.
Using the new parameters Θ we apply the Kalman filter from t=1 to t=T 1 to estimate the common factor. This factor is in turn projected to t=T 2 . V.
With the estimated common factor derived from step 4 as a regressor, we rebalance again the panel using the same procedure used in step 2. Steps 2 to 5 are iterated until Página 7 convergence is achieved. The convergence criterion states that the change of the likelihood function should not trespass a given threshold.
The initial longitudinal panel should be wide enough to be representative, easing the usual trade-off between temporal coverage and cross-section coverage. After several tests, we selected January 1990 as the starting point of the panel data, providing a sensible balance in the above mentioned trade-off.

Linkage with macroeconomic variables via transfer function modeling
One of the main goals of the model consists in designing a connection between high-frequency indicators and the key variables that shape the macro scenario. In order to do it in a simple and efficient way, a transfer function model emerges as the ideal solution, providing real-time estimates of quarterly GDP using monthly indicators.
Once we have completed the estimation process of the dynamic factor model, taken into account the basic nature of the indicators as (standardized) period on period rates of growth, we can follow Mariano and Murasawa (2003) and represent the factor at the quarterly frequency combining the monthly observations according to: where f t represents the monthly dynamic common factor and f T is its temporally aggregated (quarterly) counterpart with time indexes related by T=3t. Hence, quarter T comprises months t, t-1 and t-2.
We consider that the dynamic relationship at the quarterly frequency between the common factor and the GDP may be articulated using a linear transfer function: • y T is the GDP, quarter on quarter rate of growth.
• f T is the dynamic common factor, temporally aggregated according to Mariano-Murasawa. • n T is a stochastic disturbance that obeys a stationary and invertible ARMA(p,q) model.
The intercept c represents the non-stochastic component of y T and V(B) is the filter that passes on the information contained in f T to contemporaneous and future values of y T .
In order to specify the impulse-response V(B) in a parsimonious way we follow Box and Jenkins (1976) and represent it in a rational form. Hence, the model [2.12] becomes: Where u T~i id N(0,v u ) and δ r (B), ω s (B), φ p (B) and θ q (B) are polynomials on the backward operator B with orders r, s, p and q, respectively. We assume that all of them have their roots outside of the unit circle. The term b≥0 is the pure delay of the transfer function.
We arrive at the final form for [2.13] following the adaptive methodology of Box-Jenkins, refined and tailored to the transfer function case by Liu and Hanssens (1982), Hanssens and Liu (1983) and Tsay and Wu (2003), among others. In particular, tentative identification of the orders b, r and s of the (rational) impulse response is performed using the corner method (Beguin et al., 1980) as implemented by Liu (2005). The orders p and q of the model for the perturbation are determined using the the so-called Smallest Canonical Analysis (SCAN), see Tsay and Tiao (1985).
This methodology provides a statistically well-rooted method to determine the dynamic form of the relationship between y T and f T , avoiding ad hoc data mining and other pitfalls of the standard bridge equation approach.

DATA
This section details the indicators that have been selected for model estimation and the preliminary processing that they have gone through.

Selection of indicators
Given the objective of the model and the econometric methodology at hand, we have made a relatively wide selection of monthly indicators. The selection process was carried out under the premise that indicators should be available timely and should provide a synthetic measure of the growth rate of the Spanish economy, being selected at their more aggregated level 2 . Additionally, they should have a correlation with GDP growth greater than 0.4 in absolute value. The 31 selected economic indicators, listed in Table 3.1, can be divided into five large blocks.
The first set includes information related to the domestic production. Among them we include the traditional series that are used to capture the evolution of economic activity, such as apparent consumption of cement, energy consumption or the industrial production index.
In the second block we have considered those economic variables related to the external sector, such as exports and imports of goods and services suitably deflated.
The third block consists of "soft" or qualitative indicators, where the economic sentiment indicator plays an important role due to their prompt availability. The financial variables are represented by (deflated) credit to firms and households.
Finally, the number of social security contributors, the number of registered contracts and the number of employed provided by the Labor Force Survey (LFS) 3 , stands for the aggregate evolution of the Spanish labor market.

Preliminary processing
As already mentioned, the objective of the model is to provide a synthetic measure of the rate of growth of the economy. This goal requires identifying a reliable signal of growth to be fitted by the factor model. In practice, the identification of this signal requires applying a filter to the series that isolates their secular trend (long term) from their cyclical evolution (short term). A detailed analysis of the different measures of economic growth can be found in Melis (1991) and Espasa and Cancelo (1993).
In order to emphasize the short-term information contained in the indicators, we have chosen the regular first difference of the log time series to perform such decomposition. The high-pass 3 The data provided by the LFS are the only ones compiled on a quarterly basis. In order to preserve the monthly nature of the data set, we have used temporal disaggregation techniques to derive consistent monthly figures, see Boot et al. (1967). The transformation has been applied to the seasonally adjusted levels.  (Maravall and Gómez, 1996;Caporello and Maravall, 2004). Formally, the transformation is: In the specific case of "soft" series, typically measured as balances of qualitative responses, the transformation applied is as follows:

EMPIRICAL RESULTS
The loading vector is estimated by means of principal components factor analysis combined with resampling techniques, suitably adapted to the time series context by Politis and Romano (1994). Estimation is based on 10,000 bootstrap replicates. The resampling procedure uses the stationary bootstrap with an expected size block of 41 months. This method provides a measure of the precision of point estimates and does not require any assumption concerning the distributional features of the data. See Appendix A for details. The following table shows the results, sorted from highest to lowest: Página 12 Using the previous results we estimate the parameters of the static common factor model by ordinary least squares, obtaining:

Table 4.2: Common factor: AR(4) estimates
Using the same estimation procedure applied to the specific factors we get: The dynamic common factor is estimated using the Kalman filter and its quarterly counterpart, temporally aggregated using the Mariano-Murasawa formula. It shows a remarkable conformity with GDP growth, as may be appreciated in the next graph 4 : The cross correlation function also shows a high degree of conformity between the common factor of the system and the GDP. The function has a maximum at lag zero, confirming the coincident nature of the factor with respect to GDP. Moreover, its asymmetric shape points to a tendency of the factor to lead GDP. This feature is very convenient for nowcasting and shortterm forecasting.

Note: Negative (positive) lags indicate that the factor is leading (lagging) GDP.
Following the methodology described in Liu (2005), the orders finally selected for the transfer function are: b=0, s=r=1 and p=q=0. The formal expression is: Moreover, a separate multivariate analysis, based on the estimation of an autoregressive and vector moving average (VARMA) model, clearly ascertains a unidirectional Granger-causality that goes from factor to GDP and not vice versa. This lack of feedback justifies the use of a transfer function. Furthermore, this analysis suggests a tentative similar model: b=0, r=s=1 and p=q=1. It was found that the modeling of the disturbance may ultimately be simplified, obtaining p=q=0. See Appendix C for additional details on the VARMA analysis.
The next table displays the estimation of the transfer function model by exact maximum likelihood:

Table 4.4: Transfer function estimates
Following Tsay and Tiao (1985) we have performed a canonical analysis of the residuals (the socalled Smallest Canonical Analysis, SCAN). The results do not show any major inadequacy, in line with the autocorrelation function.
There is some evidence of changing volatility reflected in the kurtosis (3.62), in the autocorrelation of the squared residuals (systematically positive) and in the variability of the variance of the residuals, as shown in the following graph: However, this evidence is not strong enough to reject the gaussianity assumption using the Jarque-Bera test 5 but deserves additional analysis using more sophisticated methods in future research (e.g., stochastic volatility models).
The dynamics implied by the estimated transfer function reveals the high degree of persistence of the GDP and the critical role of the factor to provide a fast adjustment to its long-run average growth: In order to evaluate the forecasting performance of the model we have done several backtesting exercises. In all cases, the model has proved its usefulness as a tool for short-term economic analysis and the assessment of the growth pattern. As an example, the following graph shows the good tracking properties of the model during the previous four years. We have compared the predictions made by the transfer function with those that have been generated by three standard univariate models used in forecasting GDP growth: a random walk, I(1), a first-order autoregressive and moving average, ARMA(1,1), and a fourth-order autoregression, AR(4). The first one represents a "no change" assumption, the second one a univariate transposition of the VARMA(1,1) model and the third one considers only pure AR representations 6 .
The table below shows alternative measures of the forecasting performance of the models during the span 2006:Q1-2009:Q4: root of mean squared errors (RMSE) and mean of absolute errors (MAE), both considering one-step ahead forecasts. This time span has been chosen to take into account both a period of high growth and a period of sharp and deep contraction of aggregate output.
The competitive edge of the transfer function model relies on its efficient use of monthly information combined with a proper dynamic specification, leading to better outcomes than its peers. As an example of the production mode of the system, the following graph plots the evolution of the real-time forecast of fourth quarter GDP on a daily basis, including its ±σ confidence interval:  Observing the graph, we can see how the model reacts to the coming out of data updates. This process reduces the amplitude of the confidence interval, as the cross-sectional estimates are replaced by actual data. Initially, when only "soft" indicators are available, the estimate remains around 0%. When "hard" information arrives concerning October (industrial production and large companies sales), the estimate changes and begins to fluctuate around −0.1%. At the end of the interval, there is an upward jump due to the arrival of new information contained in the services sector turnover index, industrial order books index and industrial turnover index. As may be seen in table 4.1, all of them belong to the most relevant indicators, in terms of their loadings. The final forecast is −0.1145%.

Real-time forecasts
The complete picture shows a general motion away from positive values although less contractive than in the previous quarter (−0.3009%). These forecasts were in close agreement with the GDP flash release disseminated by the National Statistical Institute (−0.1%), a figure later revised to −0.1475% 7 .

CONCLUSIONS
In this paper we have designed a real-time, coincident indicator of the Spanish business cycle. It has a straightforward interpretation as the dynamic common factor of a set of representative short-term monthly economic indicators. This synthetic indicator also plays a critical role in GDP forecasting, by means of a suitable dynamic projection based on transfer function modeling.
The model differs from others proposed in the literature due to its medium-scale. This feature provides a certain advantage over small-scale models due to its higher information content and, at the same time, avoids the technical problems concerning the consistency of the estimators that hamper large-scale models. Moreover, its two-step approach strengthens the operative characteristics of the model, providing a hedge from changes in the relationship between indicators and macroeconomic aggregates.
This work could be extended in several directions. The incorporation of leading indicators would enrich the dynamic structure of the model. Another possibility is to apply this methodology to other macroeconomic aggregates, being the demand-side components of GDP prime candidates. Anyway, since the model is eminently empirical, its use in a production mode will determine the way forward, including changes in the list of indicators and refinements of the estimation procedures.
The normalized eigenvector associated with the largest eigenvalue of the correlation matrix of Z, provides an estimate of the loading matrix L: The variance-covariance matrix of the specific factors is then estimated as a residual: In order to obtain estimates of L and Ψ with appropriate standard errors, we apply [A.2] and [A.3] to the resampled time series. Resampling is performed using the bootstrap technique suggested by Politis and Romano (1994), in which the resampling is applied with reposition to blocks of varying size. The block size is selected each time according to a predefined probability distribution. In our application we have used the geometric distribution with an expected block size of 41 months 8 . The results are robust with respect to alternative mean block size. The estimation is repeated 10,000 times and the corresponding averages and standard deviations provide the estimates for L andΨ.
The stationary bootstrap provides more robust results than other resampling methods, notably those procedures based on the use of fixed size blocks, e.g. Künsch (1989). In fact, the former may be considered as a weighted average over block size of the latter, generating a smoothed version of it.
With the resulting point estimates of L and Ψ we transform the original factor model into one akin to a multivariate regression model. Hence, an initial estimate can be obtained using generalized least squares (GLS): A complete analysis of these issues can be found in Mardia et al. (1979).
From [A.4] we can compute a summary measure of the empirical content of each real time forecast. First, we define the weighted availability for each indicator: Being: • θ j : score of the indicator j.
• I j,t : binary variable that indicates the availability of indicator j at time t.
Aggregating the above expression on its cross-section dimension, we finally derive a synthetic measure of the information content of the forecast made at time t: The second module, {Update}, can be described as follows: once the actual data has been observed at time t, the algorithm updates the state vector estimate, adjusting it according to the information available at t-1 and the prediction error. The corresponding expressions are: The matrix K t (Kalman gain) reflects the degree of adjustment that should be applied to the prediction error. It depends on the accuracy of the estimate of the state vector, on the volatility of the prediction errors and on the sensitivity of the series observed with respect to the state vector: The estimation of dynamic factor models is analyzed in Watson and Engle (1983), Watson and Kraft (1984) and Kim and Nelson (1999), among others. The Kalman filter is explained in O'Connell (1984) and Kim and Nelson (1999).

APPENDIX C: VARMA ANALYSIS
In this section we estimate a vector autoregressive moving-average (VARMA) model to summarize the econometric relationship between the dynamic common factor (f t ) and the GDP quarter on quarter rate of growth (y t ), see Tiao and Box (1981), Lütkepohl (1991), Reinsel (1993) and Tiao (2001), for an in-depth analysis of such models.
Consider a k-dimensional vector, Z t , which evolves following a VARMA(p,q) model, which can be expressed by the following equation: Being Φ p (B) and Θ q (B) polynomial matrix operators of degree p and q, respectively. Furthermore, the vector U t can be characterized by the following distributional properties: Being Σ, in general, a non-diagonal matrix. Additionally, it is assumed that all the roots of the determinantal polynomials |Φ p (B)| and |Θ q (B)| lie either on or outside the unit circle.
The canonical correlation analysis of Tsay-Tiao suggests that a low-order VARMA(1,1) provide a reasonable fit to the data. This model serves as a benchmark to check the adequacy of several specifications concerning the direction of (Granger) causality. The results are summarized in the following table:

): Granger-causality analysis
The results strongly support the hypothesis that the dynamic common factor is an input in the determination of GDP and that the use of transfer function is well grounded.
The estimation of the constrained 9 VARMA(1,1) model by exact maximum likelihood yields the following results: 9 The constraint c2=0 is considered in addition to the ones defined in the second row of

Constrained maximum likelihood estimation
Note: 0 and --mean restricted parameters. Γ is the correlation matrix linked to Σ.
The residuals obtained from the VARMA model do not show any major inadequacy, as may be seen from their corresponding SCAN table: To further analyze the underlying structure of the VARMA model we perform a canonical analysis, following Box and Tiao (1977). The results are summarized in the following table: