Skip to main content
Log in

A sampling approach to estimate the log determinant used in spatial likelihood problems

  • Original Article
  • Published:
Journal of Geographical Systems Aims and scope Submit manuscript

Abstract

Likelihood-based methods for modeling multivariate Gaussian spatial data have desirable statistical characteristics, but the practicality of these methods for massive georeferenced data sets is often questioned. A sampling algorithm is proposed that exploits a relationship involving log-pivots arising from matrix decompositions used to compute the log determinant term that appears in the model likelihood. We demonstrate that the method can be used to successfully estimate log-determinants for large numbers of observations. Specifically, we produce an log-determinant estimate for a 3,954,400 by 3,954,400 matrix in less than two minutes on a desktop computer. The proposed method involves computations that are independent, making it amenable to out-of-core computation as well as to coarse-grained parallel or distributed processing. The proposed technique yields an estimated log-determinant and associated confidence interval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The Home Mortgage Disclosure Act (HMDA) data with over 100 million observations and the 2000 Census contains information organized into over eight million blocks.

  2. For example, the spatial dependence parameter affects the regression parameter estimates for models with spatial lags of the dependent variables.

  3. This is using Matlab on a 2.8 gigahertz dual processor Opteron computer.

  4. One could symmetricize D r to ensure all real eigenvalues by defining D s  = S(D r  + D r )S where S is a diagonal scaling matrix. Done in the most straightforward way, symmetricization requires holding the entire D r in memory since rescaling the symmetricized matrix requires operations involving all rows and columns. In contrast, scaling of N to yield D r is a simple scalar operation. Consequently, D r as well as D r z can be created row-by-row, and thus is well-suited to parallel and distributed processing.

  5. The lack of dependence of log-pivots u(α) i on elements in Z(α) l,m where lm > i is a feature of LU and Cholesky decompositions not shared by the Schur, spectral, or singular value decompositions. Note, the log-pivots are always real for real non-symmetric matrices, whereas quantities such as eigenvalues may be complex.

  6. To construct this comparison between infill and increasing domain ordering only the first 10,000 observations from Census block group locations were used. Because infill orderings produce matrices with a high bandwidth (many non-zero elements far away from the diagonal), this leads to nearly dense LU or Cholesky triangle matrices. This resulted in computer memory problems, which required a reduction in sample size used to construct the example. In contrast, the increasing domain ordering results in a more local I n  − αD S and sparser LU matrices. There were no memory problems using the entire set of 213,172 for the increasing domain ordering. Similar results pertain to timing, as finding the log-determinant of the first 10,000 observations took 3.48 s with the increasing domain ordering and 660.91 s with the infill ordering.

References

  • Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  • Barry R, Pace RK (1997) Kriging with large data sets using sparse matrix techniques. Commun Stat Comput Simul 26:619–629

    Article  Google Scholar 

  • Barry R, Pace RK (1999) A Monte Carlo estimator of the log determinant of large sparse matrices. Linear Algebra Appl 289:41–54

    Article  Google Scholar 

  • Bavaud F (1998) Models for spatial weights: a systematic look. Geogr Anal 30:153–171

    Google Scholar 

  • Caragea PC, Smith RL (2007) Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models. J Multivar Anal 98:1417–1440

    Article  Google Scholar 

  • Cressie N (1993) Statistics for spatial data. Wiley, New York

    Google Scholar 

  • Cressie N, Aldworth J (1997) Spatial statistical analysis and its consequences for spatial sampling. In: Baafi EY, Schofield NA (eds) Geostatistics Wollongong ’96, vol 1. Kluwer, Dordrecht, pp 126–137

  • Daubert v. Merrill Dow Pharmaceuticals, Inc., 293 U.S. 579 (1993)

  • Griffith D (2000) Eigenfunction properties and approximations of selected incidence matrices employed in spatial analysis. Linear Algebra Appl 321:95–112

    Article  Google Scholar 

  • Griffith D, Sone A (1995) Trade-offs associated with normalizing constant computational simplifications for estimating spatial statistical models. J Stat Comput Simul 51:165–183

    Article  Google Scholar 

  • Marcus M, Minc H (1992) A survey of matrix theory and matrix inequalities. Dover, New York

    Google Scholar 

  • Pace RK, Barry R (1997) Quick computation of regressions with a spatially autoregressive dependent variable. Geogr Anal 29:232–247

    Google Scholar 

  • Pace RK, LeSage J (2004a) Spatial autoregressive local estimation. In: Mur J, Zoller H, Getis A (eds) Recent advances in spatial econometrics. Palgrave Publishers, pp 31–51

  • Pace RK, LeSage J (2004b) Chebyshev approximation of log-determinants of spatial weight matrices. Comput Stat Data Anal 45:179–196

    Article  Google Scholar 

  • Smirnov O, Anselin L (2001) Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Comput Stat Data Anal 35:301–319

    Article  Google Scholar 

  • Stein ML (1999) Interpolation of spatial data: some theory for Kriging. Springer, New York

    Google Scholar 

  • Strang G (1976) Linear algebra and its applications. Academic Press, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James P. LeSage.

Additional information

Kelley Pace would like to acknowledge support from NSF SES-0729259 and from the Louisiana Sea grant program. Both authors would like to thank Jennifer Zhu and an anonymous reviewer for helpful comments. James LeSage is grateful for support from NSF SES-0729264 and the Texas Sea grant program.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pace, R.K., LeSage, J.P. A sampling approach to estimate the log determinant used in spatial likelihood problems. J Geogr Syst 11, 209–225 (2009). https://doi.org/10.1007/s10109-009-0087-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10109-009-0087-7

Keywords

JEL Classfication

Navigation