Abstract
This paper endeavors to develop some dimension reduction techniques in panel data analysis when the numbers of individuals and indicators are very large. We use principal component analysis method to represent a large number of indicators via minority common factors in the factor models. We propose the dynamic mixed double factor model (DMDFM for short) to reflect cross section and time series correlation with the interactive factor structure. DMDFM not only reduces the dimension of indicators but also deals with the time series and cross section mixed effect. Different from other models, mixed factor models have two styles of common factors. The regressors factors reflect common trend and the dimension reducing, while the error components factors reflect difference and weak correlation of individuals. The results of Monte Carlo simulation show that generalized method of moments estimators have good properties of unbiasedness and consistency. Simulation results also show that the DMDFM can improve the prediction power of the models effectively.
Similar content being viewed by others
References
Ahn SG, Lee YH, Schmidt P (2001) GMM Estimation of linear panel data models with time-varying individual effects. J Econ 101:219–255
Andrews DWK (2005) Cross-section regression with common shocks. Econometrica 73:1551–1585
Anderson B, Deistler M (2008) Generalized linear dynamic factor models—a structure theory. In: 2008 IEEE conference on decision and control
Arellano M, Bond SR (1991) Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Rev Econ Stud 58:277–297
Arellano M, Bover O (1995) Another look at the instrumental variable estimation of error components models. J Econ 68:29–51
Bai J (2003) Inferential theory for factor models of large dimensions. Econometrica 71:135–173
Bai J (2009) Panel data models with interactive fixed effects. Econometrica 77:1229–1279
Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221
Chamberlain G, Rothschild M (1983) Arbitrage, factor structure and mean-variance analysis in large asset markets. Econometrica 51:1281–1304
Fan J, Fan Y, Lv J (2008) High dimensional covariance matrix estimation using a factor model. J Econ 147:186–197
Forni M, Hallin M, Lippi M, Reichlin L (2000) The generalized dynamic factor model: identification and estimation. Rev Econ Stat 82:540–554
Hallin M, Liska R (2007) Determining the number of factors in the general dynamic factor model. J Am Stat Assoc 102:603–617
Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica 50:1029–1054
Harding M, Nair KK (2009) Estimating the number of factors and lags in high dimensional dynamic factor models. Mimeo
Hsiao C (2003) Analysis of panel data. Cambridge University Press, New York
Mallows CL (1973) Some comments on Cp. Technometrics 15:661–675
Moon HR, Perron B (2004) Testing for a unit root in panels with dynamic factors. J Econ 122:81–126
Newey W, Mcfadden D (1994) Large sample estimation and hypothesis testing. In: Engle RF, McFadden D (eds) Handbook of econometrics. North Holland, Amsterdam, pp 2111–2245
Pesaran MH (2006) Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74:967–1012
Ross S (1976) The arbitrage theory of capital asset pricing. J Econ Theory 13:341–360
Stock JH, Watson MW (2002) Forecasting using principal components from a large number of predictors. J Am Stat Assoc 97:1167–1179
Stock JH, Watson MW (2005) Implications of dynamic factor models for VAR analysis. Princeton University, Princeton
Acknowledgements
This study was funded by National Natural Science Foundation of China (714711730, 71873137 & 71271210) and supported by fund for building world-class universities (disciplines) of Renmin University of China. Fang’s study was funded by The Philosophy and Social Science Fund of Anhui (AHSKY2015D53).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by Y. Ni.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proof of theoretical results
Appendix: Proof of theoretical results
A. Proof of Theorem 4.1.
Denote \(b(z,\beta )=Z_{i}\Delta \epsilon _{i}\), where \(\beta =(\beta _{L}^{'},\beta _{F}^{'})_{'}\). From Eq. (4.8), we have \(E[b(z,\beta )]=0\). We calculate partial derivative for each parameter to be estimated, \(\partial b(z,\beta )/\partial \beta \), then let
because the uniform consistency of random disturbance term, using Taylor series expansion around \(\beta _{L}\) and \(\beta _{F}\):
where \({\hat{\beta }}=({\hat{\beta }}_{L}^{'},{\hat{\beta }}_{F}^{'})^{'}\), \(\beta _{L}^{*}\), \(\beta _{F}^{*}\) are between \(\beta _{L}\), \({\hat{\beta }}_{L}\), and \(\beta _{F}\), \({\hat{\beta }}_{F}\), respectively, multiplied by weighting matrix A simultaneously:
Given the following three items:
- (i)
From assumptions as before, given optimal weighting matrix \(A_{O}\), we can obtain unique optimal estimator of \(\beta \). \(\beta \) is continuous vector defined on Euclid space \(R^{n}\), and space \(\Theta \) constituted by \(\beta \) is a subset of \(R^{n}\), and is closed and bounded.
- (ii)
For \(b(z,\beta )=Z_{i}\Delta \epsilon _{i}\), \(\forall \epsilon >0\), from (6.1)
$$\begin{aligned} E(b(z,{\hat{\beta }}))=b(z,\beta ) \end{aligned}$$so,
$$\begin{aligned} \big |b(z,{\hat{\beta }})-b(z,\beta )\big |\xrightarrow {p}0 \end{aligned}$$(6.3)for given matrix A, denote
$$\begin{aligned} \hat{S}_{N}(\beta )=b\big (z,{\hat{\beta }}\big )^{'}\hat{A}bv(z,{\hat{\beta }}\big ) \end{aligned}$$and
$$\begin{aligned} S_{0}(\beta )=b(z,\beta )^{'}Ab(z,\beta ) \end{aligned}$$from (A.3), \(S_{0}(\beta )\) is continuous.
- (iii)
Next, prove \(S_{0}(\beta )\) convergence with probability 1.
Using triangle inequalities
Using Cauchy–Schwartz inequalities
because
we have
By Newey and Mcfadden (1994), following uniform convergence theorem, the conclusion is obtained. \(\square \)
B. Proof of Theorem 4.2.
- (1)
Because
$$\begin{aligned}&\partial R_{1}(\beta _{L},\beta _{F})/\partial \beta =\partial \left( b(z,\beta )^{'}Ab(z,\beta )\right) /\partial \beta \\&\quad =\partial \left( b(z,\beta )^{'}/\partial \beta Ab(z,\beta )\right) +\partial \left( b(z,\beta )^{'}/\partial \beta Ab(z,\beta )\right) \\&\quad =2\partial \left( b(z,\beta )^{'}/\partial \beta Ab(z,\beta )\right) \end{aligned}$$where \(\beta =(\beta _{L},\beta _{F})^{'}\) for notation simplicity. Following this notation, in order to estimate GMM, we solve first-order condition, so we obtain that
$$\begin{aligned} R_{1}({\hat{\beta }})^{'}Ab(z,{\hat{\beta }})=0 \end{aligned}$$(6.4)from (6.1), for optimal matrix \(A_{O}\), we have
$$\begin{aligned} R_{1}(\beta )^{'}A_{O}b(z,{\hat{\beta }})= & {} R_{1}(\beta )^{'}A_{O}\sqrt{N}b (z,{\hat{\beta }})\nonumber \\&+\,o(b(z,\beta )) \end{aligned}$$(6.5)using Taylor series expansion around \(\beta \)
$$\begin{aligned}&R_{1}(\beta )^{'}A_{O}b(z,{\hat{\beta }})=R_{1}(\beta )^{'}A_{O} \left( \sqrt{N}b(z,\beta ) \right. \\&\quad \left. +\,R_{1}(\beta )\sqrt{N}\left( {\hat{\beta }}-\beta \right) \right) +o(b(z,\beta )) \end{aligned}$$from (6.4),we have
$$\begin{aligned}&R_{1}(\beta )^{'}A_{O}R_{1}(\beta )\sqrt{N}\left( {\hat{\beta }}-\beta \right) \\&\quad =-R_{1}(\beta )^{'}A_{O}\sqrt{N}b(z,\beta )+o(b(z,\beta )) \end{aligned}$$so
$$\begin{aligned}&\sqrt{N}({\hat{\beta }}-\beta )=-\left( R_{1}(\beta )^{'}A_{O}R_{1}(\beta )\right) ^{-1}\\&\quad R_{1}(\beta )^{'}A_{O}\sqrt{N}b(z,\beta )+o(b(z,\beta )) \end{aligned}$$by Eq. (4.15) as previous, we have
$$\begin{aligned} \sqrt{N}b(z,\beta )\xrightarrow {d}N(0,D_{1}) \end{aligned}$$and
$$\begin{aligned} \left( R_{1}(\beta )^{'}A_{O}R_{1}(\beta )\right) ^{-1}R_{1}(\beta )^{'}A_{O} \end{aligned}$$is a determined matrix, so
$$\begin{aligned} \sqrt{N}\left( {\hat{\beta }}-\beta \right) \xrightarrow {d}N(0,\Sigma _{1}) \end{aligned}$$i.e.,
$$\begin{aligned} \sqrt{N}\left( \left( {\hat{\beta }}_{L},{\hat{\beta }}_{F}\right) -(\beta _{L},\beta _{F})\right) \xrightarrow {d}N(0,\Sigma _{1}) \end{aligned}$$ - (2)
Similar to the proof of (1), omitted.
\(\square \)
Rights and permissions
About this article
Cite this article
Fang, G., Zhang, B. & Chen, K. Estimation of dynamic mixed double factors model in high-dimensional panel data. Soft Comput 24, 2527–2541 (2020). https://doi.org/10.1007/s00500-018-3603-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3603-1