Advertisement

Statistics and Computing

, Volume 5, Issue 1, pp 43–57 | Cite as

Sampling from spatial databases

  • Frank Olken
  • Doron Rotem
Article

Abstract

This paper deals with techniques for obtaining random point samples from spatial databases. We seek random points from a continuous domain (usually ℝ2) which satisfy a spatial predicate that is represented in the database as a collection of polygons. Several applications of spatial sampling (e.g. environmental monitoring, agronomy, forestry, etc) are described. Sampling problems are characterized in terms of two key parameters: coverage (selectivity), and expected stabbing number (overlap). We discuss two fundamental approaches to sampling with spatial predicates, depending on whether we sample first or evaluate the predicate first. The approaches are described in the context of both quadtrees and R-trees, detailing the sample first, acceptance/rejection tree, and partial area tree algorithms. A sequential algorithm, the one-pass spatial reservoir algorithm is also described. The relative performance of the various sampling algorithms is compared and choice of preferred algorithms is suggested. We conclude with a short discussion of possible extensions.

Keywords

geographic information systems GIS quadtrees query processing query optimization R-trees random sampling relational databases reservoir sampling sampling algorithms simple random sampling sequential sampling spatial data structures spatial databases 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aldous, D. (1989) Probability Approximations via the Poisson Clumping Heuristic. Springer-Verlag, Berlin.Google Scholar
  2. Antoshenkov, G. (1992) Random sampling from pseudo-ranked B + trees. In Proceedings of the 19th International Conference on Very Large Databases (VLDB), pp. 375–82.Google Scholar
  3. Cochran, W. G. (1977) Sampling Techniques. Wiley, New York.Google Scholar
  4. de Vries, P. G. (1986) Sampling Theory for Forest Inventory. Springer-Verlag, New York.Google Scholar
  5. Guttman, A. (1984) R-trees: A dynamic index structure for spatial searching. In ACM SIGMOD International Conference on the Management of Data, pp. 47–57.Google Scholar
  6. Hall, P. (1988) Introduction to the Theory of Coverage Processes. Wiley, New York.Google Scholar
  7. Hou, W.-C., Ozsoyoglu, G. and Taneja, B. K. (1988) Statistical estimators for relational algebra expressions. In Proceedings of the 7th ACM Conference on Principles of Database Systems, pp. 288–93.Google Scholar
  8. Hou, W.-C., Ozsoyoglu, G. and Taneja, B. K. (1989) Processing aggregate relational queries with hard time constraints. In ACM SIGMOD International Conference on the Management of Data, pp. 68–77.Google Scholar
  9. Knuth, D. E. (1981) The Art of Computer Programming: Vol. 2, Semi-numerical Algorithms, 2nd edn. Addison-Wesley, Reading, MA.Google Scholar
  10. McLeod, A. and Bellhouse, D. (1983) A convenient algorithm for drawing a simple random sample. Applied Statistics, 32, 182–4.Google Scholar
  11. Olken, F. and Rotem, D. (1989) Random sampling from B + trees. In Proceedings of the 15th International Conference on Very Large Databases (VLDB), pp. 269–77. Morgan Kaufman.Google Scholar
  12. Olken, F. and Rotem, D. (1995) Random sampling from data-y. Statistics and Computing.Google Scholar
  13. Rosenbaum, P. R. (1993) Sampling the leaves of a tree with equal probabilities. Journal of the American Statistical Association, 88, 1455–57.Google Scholar
  14. Samet, H. (1984) The quadtree and related hierarchical data structures. ACM Computing Surveys, 6, 187–260.Google Scholar
  15. Samet, H. (1989a) Applications of Spatial Data Structures. Addison-Wesley, Reading, MA.Google Scholar
  16. Samet, H. (1989b) The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA.Google Scholar
  17. Seppi, K. D. and Morris, C. J. (1993) A Bayesian approach to database query optimization. ORSA Journal of Computing. 5, 410–19.Google Scholar
  18. Serra, J. (1982) Image Analysis and Mathematical Morphology. Academic Press, New York.Google Scholar
  19. Vitter, J. S. (1985) Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11, 37–57.Google Scholar
  20. Wong, C. and Easton, M. (1980) An efficient method for weighted sampling without replacement. SIAM Journal on Computing, 9, 111–13.Google Scholar

Copyright information

© Chapman & Hall 1995

Authors and Affiliations

  • Frank Olken
    • 1
  • Doron Rotem
    • 1
    • 2
  1. 1.Information and Computing Sciences DivisionLawrence Berkeley LaboratoryBerkeleyUSA
  2. 2.Management Information Systems DepartmentSchool of Business, San Jose State UniversitySan JoseUSA

Personalised recommendations