A sampling approach to estimate the log determinant used in spatial likelihood problems

Pace, R. Kelley; LeSage, James P.

doi:10.1007/s10109-009-0087-7

A sampling approach to estimate the log determinant used in spatial likelihood problems

Original Article
Published: 25 April 2009

Volume 11, pages 209–225, (2009)
Cite this article

Journal of Geographical Systems Aims and scope Submit manuscript

R. Kelley Pace¹ &
James P. LeSage²

882 Accesses
55 Citations
Explore all metrics

Abstract

Likelihood-based methods for modeling multivariate Gaussian spatial data have desirable statistical characteristics, but the practicality of these methods for massive georeferenced data sets is often questioned. A sampling algorithm is proposed that exploits a relationship involving log-pivots arising from matrix decompositions used to compute the log determinant term that appears in the model likelihood. We demonstrate that the method can be used to successfully estimate log-determinants for large numbers of observations. Specifically, we produce an log-determinant estimate for a 3,954,400 by 3,954,400 matrix in less than two minutes on a desktop computer. The proposed method involves computations that are independent, making it amenable to out-of-core computation as well as to coarse-grained parallel or distributed processing. The proposed technique yields an estimated log-determinant and associated confidence interval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

Notes

The Home Mortgage Disclosure Act (HMDA) data with over 100 million observations and the 2000 Census contains information organized into over eight million blocks.
For example, the spatial dependence parameter affects the regression parameter estimates for models with spatial lags of the dependent variables.
This is using Matlab on a 2.8 gigahertz dual processor Opteron computer.
One could symmetricize D _r to ensure all real eigenvalues by defining D _s = S(D _r + D ^′_r )S where S is a diagonal scaling matrix. Done in the most straightforward way, symmetricization requires holding the entire D _r in memory since rescaling the symmetricized matrix requires operations involving all rows and columns. In contrast, scaling of N to yield D _r is a simple scalar operation. Consequently, D _r as well as D _r z can be created row-by-row, and thus is well-suited to parallel and distributed processing.
The lack of dependence of log-pivots u(α)_i on elements in Z(α)_l,m where l, m > i is a feature of LU and Cholesky decompositions not shared by the Schur, spectral, or singular value decompositions. Note, the log-pivots are always real for real non-symmetric matrices, whereas quantities such as eigenvalues may be complex.
To construct this comparison between infill and increasing domain ordering only the first 10,000 observations from Census block group locations were used. Because infill orderings produce matrices with a high bandwidth (many non-zero elements far away from the diagonal), this leads to nearly dense LU or Cholesky triangle matrices. This resulted in computer memory problems, which required a reduction in sample size used to construct the example. In contrast, the increasing domain ordering results in a more local I _n − αD _S and sparser LU matrices. There were no memory problems using the entire set of 213,172 for the increasing domain ordering. Similar results pertain to timing, as finding the log-determinant of the first 10,000 observations took 3.48 s with the increasing domain ordering and 660.91 s with the infill ordering.

References

Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, Dordrecht
Google Scholar
Barry R, Pace RK (1997) Kriging with large data sets using sparse matrix techniques. Commun Stat Comput Simul 26:619–629
Article Google Scholar
Barry R, Pace RK (1999) A Monte Carlo estimator of the log determinant of large sparse matrices. Linear Algebra Appl 289:41–54
Article Google Scholar
Bavaud F (1998) Models for spatial weights: a systematic look. Geogr Anal 30:153–171
Google Scholar
Caragea PC, Smith RL (2007) Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models. J Multivar Anal 98:1417–1440
Article Google Scholar
Cressie N (1993) Statistics for spatial data. Wiley, New York
Google Scholar
Cressie N, Aldworth J (1997) Spatial statistical analysis and its consequences for spatial sampling. In: Baafi EY, Schofield NA (eds) Geostatistics Wollongong ’96, vol 1. Kluwer, Dordrecht, pp 126–137
Daubert v. Merrill Dow Pharmaceuticals, Inc., 293 U.S. 579 (1993)
Griffith D (2000) Eigenfunction properties and approximations of selected incidence matrices employed in spatial analysis. Linear Algebra Appl 321:95–112
Article Google Scholar
Griffith D, Sone A (1995) Trade-offs associated with normalizing constant computational simplifications for estimating spatial statistical models. J Stat Comput Simul 51:165–183
Article Google Scholar
Marcus M, Minc H (1992) A survey of matrix theory and matrix inequalities. Dover, New York
Google Scholar
Pace RK, Barry R (1997) Quick computation of regressions with a spatially autoregressive dependent variable. Geogr Anal 29:232–247
Google Scholar
Pace RK, LeSage J (2004a) Spatial autoregressive local estimation. In: Mur J, Zoller H, Getis A (eds) Recent advances in spatial econometrics. Palgrave Publishers, pp 31–51
Pace RK, LeSage J (2004b) Chebyshev approximation of log-determinants of spatial weight matrices. Comput Stat Data Anal 45:179–196
Article Google Scholar
Smirnov O, Anselin L (2001) Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Comput Stat Data Anal 35:301–319
Article Google Scholar
Stein ML (1999) Interpolation of spatial data: some theory for Kriging. Springer, New York
Google Scholar
Strang G (1976) Linear algebra and its applications. Academic Press, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Finance, LREC Endowed Chair of Real Estate, E.J. Ourso College of Business Administration, Louisiana State University, Baton Rouge, LA, 70803-6308, USA
R. Kelley Pace
Department of Finance and Economics, Fields Endowed Chair in Urban and Regional Economics, McCoy College of Business Administration, Texas State University-San Marcos, San Marcos, TX, 78666, USA
James P. LeSage

Authors

R. Kelley Pace
View author publications
You can also search for this author in PubMed Google Scholar
James P. LeSage
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James P. LeSage.

Additional information

Kelley Pace would like to acknowledge support from NSF SES-0729259 and from the Louisiana Sea grant program. Both authors would like to thank Jennifer Zhu and an anonymous reviewer for helpful comments. James LeSage is grateful for support from NSF SES-0729264 and the Texas Sea grant program.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pace, R.K., LeSage, J.P. A sampling approach to estimate the log determinant used in spatial likelihood problems. J Geogr Syst 11, 209–225 (2009). https://doi.org/10.1007/s10109-009-0087-7

Download citation

Received: 01 December 2007
Accepted: 02 April 2009
Published: 25 April 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10109-009-0087-7

Keywords

JEL Classfication

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sampling approach to estimate the log determinant used in spatial likelihood problems

Abstract

Access this article

Similar content being viewed by others

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

Tutorial on PCA and approximate PCA and approximate kernel PCA

A Guide for Sparse PCA: Model Comparison and Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

JEL Classfication

Navigation

A sampling approach to estimate the log determinant used in spatial likelihood problems

Abstract

Access this article

Similar content being viewed by others

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

Tutorial on PCA and approximate PCA and approximate kernel PCA

A Guide for Sparse PCA: Model Comparison and Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classfication

Search

Navigation