Randomized algorithms of maximum likelihood estimation with spatial autoregressive models for large-scale networks
Abstract
The spatial autoregressive (SAR) model is a classical model in spatial econometrics and has become an important tool in network analysis. However, with large-scale networks, existing methods of likelihood-based inference for the SAR model become computationally infeasible. We here investigate maximum likelihood estimation for the SAR model with partially observed responses from large-scale networks. By taking advantage of recent developments in randomized numerical linear algebra, we derive efficient algorithms to estimate the spatial autocorrelation parameter in the SAR model. Compelling experimental results from extensive simulation and real data examples demonstrate empirically that the estimator obtained by our method, called the randomized maximum likelihood estimator, outperforms the state of the art by giving smaller bias and standard error, especially for large-scale problems with moderate spatial autocorrelation. The theoretical properties of the estimator are explored, and consistency results are established.
Keywords
Maximum likelihood estimation Network Randomized numerical linear algebra Spatial autoregressive modelNotes
Acknowledgements
We would like to thank the associate editor and two reviewers of Statistics and Computing for their insightful comments that greatly improved this work. Li’s work is partially supported by the Henry Laws Fellowship Award and the Taft Research Center at the University of Cincinnati. Kang’s research is partially supported by the Simons Foundation Collaboration Award (#317298) and the Taft Research Center at the University of Cincinnati. This work was supported in part by an allocation of computing time from the Ohio Supercomputer Center (OSC 1987). We would like to thank Dr. Shan Ba, Dr. Won Chang, Dr. Noel Cressie, Dr. Alex B. Konomi, and Dr. Siva Sivaganesan for their helpful suggestions.
References
- Anselin, L., Bera, A.K.: Spatial dependence in linear regression models with an introduction to spatial econometrics. Stat. Textb. Monogr. 155, 237–290 (1998)Google Scholar
- Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(4), 825–848 (2008)MathSciNetzbMATHGoogle Scholar
- Banerjee, S., Carlin, B.P., Gelfand, A.E.: Hierarchical Modeling and Analysis for Spatial Data. CRC Press, Boca Raton (2014)zbMATHGoogle Scholar
- Barry, R.P., Pace, R.K.: Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra Appl. 289(1–3), 41–54 (1999)MathSciNetzbMATHGoogle Scholar
- Beck, N., Gleditsch, K.S., Beardsley, K.: Space is more than geography: Using spatial econometrics in the study of political economy. Int. Stud. Q. 50(1), 27–44 (2006)Google Scholar
- Boutsidis, C., Drineas, P., Kambadur, P., Kontopoulou, E.M., Zouzias, A.: A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. arXiv preprint arXiv:1503.00374 (2015)
- Browne, K.: Snowball sampling: using social networks to research non-heterosexual women. Int. J. Soc. Res. Methodol 8(1), 47–60 (2005)Google Scholar
- Burden, S., Cressie, N., Steel, D.G.: The SAR model for very large datasets: a reduced rank approach. Econometrics 3(2), 317–338 (2015)Google Scholar
- Chen, X., Chen, Y., Xiao, P.: The impact of sampling and network topology on the estimation of social intercorrelations. J. Market. Res. 50(1), 95–110 (2013)Google Scholar
- Cressie, N., Johannesson, G.: Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(1), 209–226 (2008)MathSciNetzbMATHGoogle Scholar
- Darmofal, D.: Spatial Analysis for the Social Sciences. Cambridge University Press, Cambridge (2015)Google Scholar
- Doreian, P.: Estimating linear models with spatially distributed data. Sociol. Methodol. 12, 359–388 (1981)Google Scholar
- Doreian, P., Freeman, L., White, D., Romney, A.: Models of network effects on social actors. In: Research Methods in Social Network Analysis pp. 295–317 (1989)Google Scholar
- Fujimoto, K., Chou, C.P., Valente, T.W.: The network autocorrelation model using two-mode data: affiliation exposure and potential bias in the autocorrelation parameter. Soc. Netw. 33(3), 231–243 (2011)Google Scholar
- Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, pp. 1207–1214 (2012)Google Scholar
- Haggett, P.: Hybridizing alternative models of an epidemic diffusion process. Econ. Geogr. 52(2), 136–146 (1976)Google Scholar
- Lee, L.F.: Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72(6), 1899–1925 (2004)MathSciNetzbMATHGoogle Scholar
- Lee, L.F., Liu, X.: Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econ. Theory 26(1), 187–230 (2010)MathSciNetzbMATHGoogle Scholar
- Lee, L., Yu, J.: Estimation of spatial autoregressive panel data models with fixed effects. J. Econ. 154(2), 165–185 (2010)MathSciNetzbMATHGoogle Scholar
- Lf, L., Liu, X., Lin, X.: Specification and estimation of social interaction models with network structures. Econ. J. 13(2), 145–176 (2010)MathSciNetGoogle Scholar
- Leenders, R.T.: Modeling social influence through network autocorrelation: constructing the weight matrix. Soc. Netw. 24(1), 21–47 (2002)Google Scholar
- LeSage, J., Pace, R.K.: Introduction to Spatial Econometrics. Chapman and Hall, Boca Raton (2009)zbMATHGoogle Scholar
- LeSage, J.P., Pace, R.K.: Models for spatially dependent missing data. J. Real Estate Financ. Econ. 29(2), 233–254 (2004)Google Scholar
- Leskovec, J., Krevl, A.: SNAP Datasets: Stanford Large Network Dataset Collection (2014)Google Scholar
- Lichstein, J.W., Simons, T.R., Shriner, S.A., Franzreb, K.E.: Spatial autocorrelation and autoregressive models in ecology. Ecol. Monogr. 72(3), 445–463 (2002)Google Scholar
- Lin, X., Lf, L.: Gmm estimation of spatial autoregressive models with unknown heteroskedasticity. J. Econ. 157(1), 34–52 (2010)MathSciNetzbMATHGoogle Scholar
- Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)Google Scholar
- O’Malley, A.J.: The analysis of social network data: an exciting frontier for statisticians. Stat. Med. 32(4), 539–555 (2013)MathSciNetGoogle Scholar
- Ord, K.: Estimation methods for models of spatial interaction. J. Am. Stat. Assoc. 70(349), 120–126 (1975)MathSciNetzbMATHGoogle Scholar
- OSC: Ohio Supercomputer Center. Columbus, OH: Ohio Supercompu-ter Center. http://osc.edu/ark:/19495/f5s1ph73 (1987). Accessed 21 Dec 2018
- Pace, R.K., Barry, R.: Sparse spatial autoregressions. Stat. Probab. Lett. 33(3), 291–297 (1997)zbMATHGoogle Scholar
- Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006)zbMATHGoogle Scholar
- Robins, G.: A tutorial on methods for the modeling and analysis of social network data. J. Math. Psychol. 57(6), 261–274 (2013)MathSciNetzbMATHGoogle Scholar
- Robins, G., Pattison, P., Elliott, P.: Network models for social influence processes. Psychometrika 66(2), 161–189 (2001)MathSciNetzbMATHGoogle Scholar
- Shao, J.: Mathematical Statistics. Springer, New York (2003)zbMATHGoogle Scholar
- Smirnov, O., Anselin, L.: Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Comput. Stat. Data Anal. 35(3), 301–319 (2001)MathSciNetzbMATHGoogle Scholar
- Smirnov, O.A.: Computation of the information matrix for models with spatial interaction on a lattice. J. Comput. Graph. Stat. 14(4), 910–927 (2005)MathSciNetGoogle Scholar
- Stewart, G.: Four algorithms for the efficient computation of truncated pivoted QR approximations to a sparse matrix. Numer. Math. 83(2), 313–323 (1999)MathSciNetzbMATHGoogle Scholar
- Suesse, T.: Estimation of spatial autoregressive models with measurement error for large data sets. Comput. Stat. 33(4), 1627–1648 (2018)MathSciNetGoogle Scholar
- Suesse, T.: Marginal maximum likelihood estimation of SAR models with missing data. Comput. Stat. Data Anal. 120, 98–110 (2018)MathSciNetzbMATHGoogle Scholar
- Suesse, T., Chambers, R.: Using social network information for survey estimation. J. Off. Stat. 34(1), 181–209 (2018)Google Scholar
- Suesse, T., Zammit-Mangion, A.: Computational aspects of the em algorithm for spatial econometric models with missing data. J. Stat. Comput. Simul. 87(9), 1767–1786 (2017)MathSciNetGoogle Scholar
- Sun, D., Tsutakawa, R.K., Speckman, P.L.: Posterior distribution of hierarchical models using car (1) distributions. Biometrika 86(2), 341–350 (1999)MathSciNetzbMATHGoogle Scholar
- Wang, S., Luo, L., Zhang, Z.: SPSD matrix approximation vis column selection: theories, algorithms, and extensions. J. Mach. Learn. Res. 17(49), 1–49 (2016)MathSciNetzbMATHGoogle Scholar
- Wang, W., Lee, L.F.: Estimation of spatial autoregressive models with randomly missing data in the dependent variable. Econ. J. 16(1), 73–102 (2013)MathSciNetGoogle Scholar
- Whittle, P.: On stationary processes in the plane. Biometrika 41, 434–449 (1954)MathSciNetzbMATHGoogle Scholar
- Woodruff, D.P., et al.: Sketching as a tool for numerical linear algebra. Found. Trends® Theor. Comput. Sci. 10(1–2), 1–157 (2014)Google Scholar
- Zhou, J., Tu, Y., Chen, Y., Wang, H.: Estimating spatial autocorrelation with sampled network data. J. Bus. Econ. Stat. 35(1), 130–138 (2017)MathSciNetGoogle Scholar