Skip to main content
Log in

Sampling from spatial databases

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This paper deals with techniques for obtaining random point samples from spatial databases. We seek random points from a continuous domain (usually ℝ2) which satisfy a spatial predicate that is represented in the database as a collection of polygons. Several applications of spatial sampling (e.g. environmental monitoring, agronomy, forestry, etc) are described. Sampling problems are characterized in terms of two key parameters: coverage (selectivity), and expected stabbing number (overlap). We discuss two fundamental approaches to sampling with spatial predicates, depending on whether we sample first or evaluate the predicate first. The approaches are described in the context of both quadtrees and R-trees, detailing the sample first, acceptance/rejection tree, and partial area tree algorithms. A sequential algorithm, the one-pass spatial reservoir algorithm is also described. The relative performance of the various sampling algorithms is compared and choice of preferred algorithms is suggested. We conclude with a short discussion of possible extensions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aldous, D. (1989) Probability Approximations via the Poisson Clumping Heuristic. Springer-Verlag, Berlin.

    Google Scholar 

  • Antoshenkov, G. (1992) Random sampling from pseudo-ranked B + trees. In Proceedings of the 19th International Conference on Very Large Databases (VLDB), pp. 375–82.

  • Cochran, W. G. (1977) Sampling Techniques. Wiley, New York.

    Google Scholar 

  • de Vries, P. G. (1986) Sampling Theory for Forest Inventory. Springer-Verlag, New York.

    Google Scholar 

  • Guttman, A. (1984) R-trees: A dynamic index structure for spatial searching. In ACM SIGMOD International Conference on the Management of Data, pp. 47–57.

  • Hall, P. (1988) Introduction to the Theory of Coverage Processes. Wiley, New York.

    Google Scholar 

  • Hou, W.-C., Ozsoyoglu, G. and Taneja, B. K. (1988) Statistical estimators for relational algebra expressions. In Proceedings of the 7th ACM Conference on Principles of Database Systems, pp. 288–93.

  • Hou, W.-C., Ozsoyoglu, G. and Taneja, B. K. (1989) Processing aggregate relational queries with hard time constraints. In ACM SIGMOD International Conference on the Management of Data, pp. 68–77.

  • Knuth, D. E. (1981) The Art of Computer Programming: Vol. 2, Semi-numerical Algorithms, 2nd edn. Addison-Wesley, Reading, MA.

    Google Scholar 

  • McLeod, A. and Bellhouse, D. (1983) A convenient algorithm for drawing a simple random sample. Applied Statistics, 32, 182–4.

    Google Scholar 

  • Olken, F. and Rotem, D. (1989) Random sampling from B + trees. In Proceedings of the 15th International Conference on Very Large Databases (VLDB), pp. 269–77. Morgan Kaufman.

  • Olken, F. and Rotem, D. (1995) Random sampling from data-y. Statistics and Computing.

  • Rosenbaum, P. R. (1993) Sampling the leaves of a tree with equal probabilities. Journal of the American Statistical Association, 88, 1455–57.

    Google Scholar 

  • Samet, H. (1984) The quadtree and related hierarchical data structures. ACM Computing Surveys, 6, 187–260.

    Google Scholar 

  • Samet, H. (1989a) Applications of Spatial Data Structures. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Samet, H. (1989b) The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Seppi, K. D. and Morris, C. J. (1993) A Bayesian approach to database query optimization. ORSA Journal of Computing. 5, 410–19.

    Google Scholar 

  • Serra, J. (1982) Image Analysis and Mathematical Morphology. Academic Press, New York.

    Google Scholar 

  • Vitter, J. S. (1985) Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11, 37–57.

    Google Scholar 

  • Wong, C. and Easton, M. (1980) An efficient method for weighted sampling without replacement. SIAM Journal on Computing, 9, 111–13.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olken, F., Rotem, D. Sampling from spatial databases. Stat Comput 5, 43–57 (1995). https://doi.org/10.1007/BF00140665

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00140665

Keywords

Navigation