Skip to main content

Random sampling from database files: A survey

  • Conference paper
  • First Online:
Statistical and Scientific Database Management (SSDBM 1990)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 420))

Abstract

In this paper we survey known results on algorithms, data structures, and some applications of random sampling from databases. We first discuss various reasons for sampling from databases, and for inclusion of sampling as a DBMS operator. We consider basic sampling algorithms, sampling from trees, sampling from hash tables, and auxiliary memory resident index information to facilitate sampling.

This work was supported by the Director, Office of Energy Research, Office of Basic Energy Sciences, Applied Mathematical Sciences Division of the U.S. Department of Energy under Contract DE-AC03-76SF00098.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Herbert Arkin. Handbook of Sampling for Auditing and Accounting. McGraw-Hill, 1984.

    Google Scholar 

  2. B.T. Bennett and V.J. Kruskal. Lru stack processing. IBM Journal of Research and Development, 19(4):353–357, July 1975.

    Google Scholar 

  3. William G. Cochran. Sampling Techniques. Wiley, 1977.

    Google Scholar 

  4. Dorothy E. Denning. Secure statistical databases with random sample queries. ACM Transactions on Database Systems, 5(3):291–35, Sept. 1980.

    Google Scholar 

  5. Jarmo Ernvall and Olli Nevalainen. An algorithm for unbiased random sampling. The Computer Journal, 25(1), 1982.

    Google Scholar 

  6. C.T. Fan, M.E. Muller, and I. Rezucha. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Journal of the American Statistical Association, 57:387–402, June 1962.

    Google Scholar 

  7. S. Ghosh. Siam: Statistics information access method. In Proceedings of the Third International Workshop on Statistical and Scientific Database Management, pages 286–293. EUROSTAT, Luxembourg, 1986.

    Google Scholar 

  8. Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeo K. Taneja. Statistical estimators for relational algebra expressions. In Proceedings of the Seventh ACM Conference on Principles of Database Systems, pages 288–293, March 1988.

    Google Scholar 

  9. Wen-Chi Hou, Gultekin Ozsoyoglu, and Baldeo K. Taneja. Processing aggregate relational queries with hard time constraints. In ACM SIGMOD International Conference on the Management of Data, pages 68–77, June 1989.

    Google Scholar 

  10. Donald Ervin Knuth. The Art of Computer Programming: Vol. 3, Sorting and Searching. Addison-Wesley, 1973.

    Google Scholar 

  11. P.-A. Larson. Linear hashing with partial expansions. In Proceedings of the Sixth International Conference on Very Large Databases (VLDB), pages 224–232, 1980.

    Google Scholar 

  12. W. Litwin. Linear hashing: a new tool for file and table addressing. In Proceedings of the Sixth International Conference on Very Large Databases (VLDB), pages 212–223, 1980.

    Google Scholar 

  13. Donald A. Leslie, Albert D. Teitlebaum, and Rodney J. Anderson. Dollar Unit Sampling. Copp Clark Pitmanan, 1979.

    Google Scholar 

  14. H.-J. Lenz, G.B. Wetherill, and P.-Th. Wilrich, editors. Frontiers in Statistical Quality Control 2. Physica-Verlag, Wurzburg, Germany, 1984.

    Google Scholar 

  15. Douglas C. Montogmery. Introduction to Statistical Quality Control. Wiley, 1985.

    Google Scholar 

  16. Jacob Morgenstein. Computer Based Management Information Systems Embodying Answer Accuracy as a User Parameter. PhD thesis, Univ. of California, Berkeley, December 1980.

    Google Scholar 

  17. J. Nievergelt, H. Hinterberger, and K.C. Sevcik. The grid file: An adaptable, symmetric multkey structure. ACM Transactions on Database Systems, 9(1):38–71, March 1984.

    Google Scholar 

  18. Frank Olken and Doron Rotem. Random sampling from b + trees.

    Google Scholar 

  19. Frank Olken and Doron Rotem. Simple random sampling from relational databases. In Proceedings of the Twelfth International Conference on Very Large Databases (VLDB), pages 160–169, August 1986.

    Google Scholar 

  20. P. Palvia. Expressions for batched searching of sequential and hierarchical files. ACM Transactions on Database Systems, 10(1):97–106, March 1985.

    Google Scholar 

  21. J. Srivastava and V.L. Lum. A tree based access method (tbsam) for fast processing of aggregate queries. In Proceedings of the 4th International Conference on Data Engineering, pages 504–510. IEEE Computer Scoeity, 1988.

    Google Scholar 

  22. Jeffrey Scott Vitter. Faster methods of random sampling. Communications of the ACM, 27(7):703–718, July 1984.

    Google Scholar 

  23. Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37–57, March 1985.

    Google Scholar 

  24. C.K. Wong and M.C. Easton. An efficient method for weighted sampling without replacement. SIAM Journal on Computing, 9(1):111–113, February 1980.

    Google Scholar 

  25. Dan Willard. Sampling algorithms for differential batch retrieval problems (extended abstract). In Proceedings ICALP-84. Springer-Verlag, 1984.

    Google Scholar 

  26. S. Bing Yao. Approximating the number of accesses in database organizations. Communications of the ACM, 20(4):260–261, April 1977.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zbigniew Michalewicz

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Olken, F., Rotem, D. (1990). Random sampling from database files: A survey. In: Michalewicz, Z. (eds) Statistical and Scientific Database Management. SSDBM 1990. Lecture Notes in Computer Science, vol 420. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-52342-1_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-52342-1_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-52342-0

  • Online ISBN: 978-3-540-46968-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics