Abstract
This paper deals with techniques for obtaining random point samples from spatial databases. We seek random points from a continuous domain (usually ℝ2) which satisfy a spatial predicate that is represented in the database as a collection of polygons. Several applications of spatial sampling (e.g. environmental monitoring, agronomy, forestry, etc) are described. Sampling problems are characterized in terms of two key parameters: coverage (selectivity), and expected stabbing number (overlap). We discuss two fundamental approaches to sampling with spatial predicates, depending on whether we sample first or evaluate the predicate first. The approaches are described in the context of both quadtrees and R-trees, detailing the sample first, acceptance/rejection tree, and partial area tree algorithms. A sequential algorithm, the one-pass spatial reservoir algorithm is also described. The relative performance of the various sampling algorithms is compared and choice of preferred algorithms is suggested. We conclude with a short discussion of possible extensions.
Similar content being viewed by others
References
Aldous, D. (1989) Probability Approximations via the Poisson Clumping Heuristic. Springer-Verlag, Berlin.
Antoshenkov, G. (1992) Random sampling from pseudo-ranked B + trees. In Proceedings of the 19th International Conference on Very Large Databases (VLDB), pp. 375–82.
Cochran, W. G. (1977) Sampling Techniques. Wiley, New York.
de Vries, P. G. (1986) Sampling Theory for Forest Inventory. Springer-Verlag, New York.
Guttman, A. (1984) R-trees: A dynamic index structure for spatial searching. In ACM SIGMOD International Conference on the Management of Data, pp. 47–57.
Hall, P. (1988) Introduction to the Theory of Coverage Processes. Wiley, New York.
Hou, W.-C., Ozsoyoglu, G. and Taneja, B. K. (1988) Statistical estimators for relational algebra expressions. In Proceedings of the 7th ACM Conference on Principles of Database Systems, pp. 288–93.
Hou, W.-C., Ozsoyoglu, G. and Taneja, B. K. (1989) Processing aggregate relational queries with hard time constraints. In ACM SIGMOD International Conference on the Management of Data, pp. 68–77.
Knuth, D. E. (1981) The Art of Computer Programming: Vol. 2, Semi-numerical Algorithms, 2nd edn. Addison-Wesley, Reading, MA.
McLeod, A. and Bellhouse, D. (1983) A convenient algorithm for drawing a simple random sample. Applied Statistics, 32, 182–4.
Olken, F. and Rotem, D. (1989) Random sampling from B + trees. In Proceedings of the 15th International Conference on Very Large Databases (VLDB), pp. 269–77. Morgan Kaufman.
Olken, F. and Rotem, D. (1995) Random sampling from data-y. Statistics and Computing.
Rosenbaum, P. R. (1993) Sampling the leaves of a tree with equal probabilities. Journal of the American Statistical Association, 88, 1455–57.
Samet, H. (1984) The quadtree and related hierarchical data structures. ACM Computing Surveys, 6, 187–260.
Samet, H. (1989a) Applications of Spatial Data Structures. Addison-Wesley, Reading, MA.
Samet, H. (1989b) The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA.
Seppi, K. D. and Morris, C. J. (1993) A Bayesian approach to database query optimization. ORSA Journal of Computing. 5, 410–19.
Serra, J. (1982) Image Analysis and Mathematical Morphology. Academic Press, New York.
Vitter, J. S. (1985) Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11, 37–57.
Wong, C. and Easton, M. (1980) An efficient method for weighted sampling without replacement. SIAM Journal on Computing, 9, 111–13.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Olken, F., Rotem, D. Sampling from spatial databases. Stat Comput 5, 43–57 (1995). https://doi.org/10.1007/BF00140665
Issue Date:
DOI: https://doi.org/10.1007/BF00140665