Advertisement

Data Mining and Knowledge Discovery

, Volume 4, Issue 4, pp 281–314 | Cite as

Informix under CONTROL: Online Query Processing

  • Joseph M. Hellerstein
  • Ron Avnur
  • Vijayshankar Raman
Article

Abstract

The goal of the CONTROL project at Berkeley is to develop systems for interactive analysis of large data sets. We focus on systems that provide users with iteratively refining answers to requests and online control of processing, thereby tightening the loop in the data analysis process. This paper presents the database-centric subproject of CONTROL: a complete online query processing facility, implemented in a commercial Object-Relational DBMS from Informix. We describe the algorithms at the core of the system, and detail the end-to-end issues required to bring the algorithms together and deliver a complete system.

online query processing interactive informix control data analysis ripple joins online reoredering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R. 1997. Personal communication.Google Scholar
  2. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. 20th International Conference on Very Large Data Bases, Santiago de Chile, September 1994.Google Scholar
  3. Aiken, A., Chen, J., Stonebraker, M., and Woodruff, A. 1996.Tioga-2:Adirect-manipulation database visualization environment. In Proc. 12th IEEE International Conference on Data Engineering, New Orleans, February 1996.Google Scholar
  4. Antoshenkov, G. 1992. Random sampling from pseudo-ranked B+ trees. In Proc. 18th International Conference on Very Large Data Bases, Vancouver, August 1992.Google Scholar
  5. Antoshenkov, G. and Ziauddin, M. 1996. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229–237.CrossRefGoogle Scholar
  6. Aoki, P.M. 1998. Generalizing “search” in generalized search trees. In IEEE International Conference on Data Engineering, Orlando, February 1998.Google Scholar
  7. Astrahan, M., Blasgen, M., Chamberlin, D., Eswaran, K., Gray, J., Griffiths, P., King, W., Lorie, R., McJones, P., Mehl, J., Putzolu, G., Traiger, I., Wade, B., and Watson, V. 1976. System R: Relational approach to database management. ACM Transactions on Database Systems, 1(2):97–137.CrossRefGoogle Scholar
  8. Avnur, R. and Hellerstein, J.M. 2000. Eddies: Continuously adaptive query processing. In Proc. ACM-SIGMOD International Conference on Management of Data, Dallas, May 2000.Google Scholar
  9. Bayardo Jr., R.J. and Miranker, D.P. 1996. Processing queries for first-few answers. In Fifth Intl. Conf. Information and Knowledge Management, Rockville, MD.Google Scholar
  10. Carey, M.J. and Kossmann, D. 1997. On saying “Enough Already!” in SQL. In Proc. ACM-SIGMOD International Conference on Management of Data, Tucson, May 1997.Google Scholar
  11. Carey, M.J. and Kossmann, D. 1998. Reducing the braking distance of an SQL query engine. In Proc. 24th International Conference on Very Large Data Bases, New York City.Google Scholar
  12. Chaudhuri, S. and Gravano, L. 1996. Optimizing queries over multimedia repositories. In Proc. ACM-SIGMOD International Conference on Management of Data, Montreal, June 1996.Google Scholar
  13. Chaudhuri, S. and Gravano, L. 1999. Evaluating top-k selection queries. In Proc. International Conference on Very Large Data Bases, Edinburgh.Google Scholar
  14. Chaudhuri, S. and Narasayya, V. 1998. AutoAdmin “What-If” index analysis utility. In Proc. ACM-SIGMOD International Conference on Management of Data, Seattle, June 1998.Google Scholar
  15. Chaudhuri, S. and Shim, K. 1996. Optimization of queries with user-defined predicates. In Proc. 24th International Conference on Very Large Data Bases, Bombay (Mumbai), September 1996.Google Scholar
  16. Cherniack, M. 1998. Building query optimizers with combinators. PhD Thesis, Brown University.Google Scholar
  17. DeWitt, D.J., Katz, R.H., Olken, Frank, Shapiro, L.D., Stonebraker, R.M., and Wood, D. 1984. Implementation techniques for main memory database systems. In Proc. ACM-SIGMOD International Conference on Management of Data, Boston, June 1984.Google Scholar
  18. Donjerkovic, D. and Ramakrishnan, R. 1999. Probabilistic optimization of Top N queries. In Proc. International Conference on Very Large Data Bases, Edinburgh.Google Scholar
  19. Fagin, R. 1998. Fuzzy queries in multimedia database systems. In Proc. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Seattle, June 1998.Google Scholar
  20. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. The kdd process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11).Google Scholar
  21. Fushimi, S., Kitsuregawa, M., and Tanaka, H. 1986. An overview of the system software of a parallel relational database machine GRACE. In Proc. 24th International Conference on Very Large Data Bases, Kyoto, August 1986.Google Scholar
  22. Gibbons, P.B. and Matias, Y. 1998. New sampling-based summary statistics for improving approximate query answers. In Proc. ACM-SIGMOD International Conference on Management of Data, Seattle.Google Scholar
  23. Gibbons, P.B., Poosala, V., Acharya, S., Bartal, Y., Matias, Y., Muthukrishnan, S., Ramaswamy, S., and Suel, T. 1998. Aqua: System and techniques for approximate query answering. Technical Report, Bell Laboratories.Google Scholar
  24. Graefe, G. 1993. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73–170.CrossRefGoogle Scholar
  25. Gray, J. and Graefe, G. 1997. The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Record, 26(4).Google Scholar
  26. Haas, P.J. 1996. Hoeffding inequalities for join-selectivity estimation and online aggregation. IBM Research Report RJ 10040, IBM Almaden Research Center.Google Scholar
  27. Haas, P.J. 1997. Large-sample and deterministic confidence intervals for online aggregation. In Proc. 9th International Conference on Scientific and Statistical Database Management, Olympia, WA, August 1997.Google Scholar
  28. Haas, P.J. and Hellerstein, J.M. 1999. Ripple algorithms for online aggregation. In Proc. ACM-SIGMOD International Conference on Management of Data, Philadelphia, May 1999.Google Scholar
  29. Haas, P.J., Naughton, J.F., Seshadri, S., and Swami, A.N. 1996. Selectivity and cost estimation for joins based on random sampling. Journal of Computer System Science, 52:550–569.CrossRefMathSciNetGoogle Scholar
  30. Harinarayan, V., Rajaraman, A., and Ullman, J.D. 1996. Implementing data cubes efficiently. In Proc. ACMSIGMOD International Conference on Management of Data, Montreal, June 1996.Google Scholar
  31. Hellerstein, J.M. 1997a. The case for online aggregation. Computer Science Technical Report CSD-97-958, University of California, Berkeley.Google Scholar
  32. Hellerstein, J.M. 1997b. Online processing redux. IEEE Data Engineering Bulletin, 20(3).Google Scholar
  33. Hellerstein, J.M. 1998a. Looking forward to interactive queries. Database Programming and Design, 11(8):28–33.Google Scholar
  34. Hellerstein, J.M. 1998b. Optimization techniques for queries with expensive predicates. ACM Transactions on Database Systems, 23(2).Google Scholar
  35. Hellerstein, J.M., Avnur, R., Chou, A., Hidber, C., Olston, C., Raman, V., and Roth, T. 1999. Interactive Data Analysis with CONTROL. IEEE Computer 32(9):51–59.Google Scholar
  36. Hellerstein, J.M., Haas, P.J., and Wang, H.J. 1997. Online aggregation. In Proc. ACM-SIGMOD International Conference on Management of Data, Tucson, May 1997.Google Scholar
  37. Hellerstein, J.M. and Naughton, J.F. 1996. Query execution techniques for caching expensive methods. In Proc. ACM-SIGMOD International Conference on Management of Data, Montreal, June 1996.Google Scholar
  38. Hidber, C. 1997. Online association rule mining. In Proc. ACM-SIGMOD International Conference on Management of Data, Tucson, May 1997.Google Scholar
  39. Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58.Google Scholar
  40. Hou, W.C., Ozsoyoglu, G., and Taneja, B.K. 1988. Statistical estimators for relational algebra expressions. In Proc. 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Austin, March 1998.Google Scholar
  41. Hou, W.C., Ozsoyoglu, G., and Taneja, B.K. 1989. Processing aggregate relational queries with hard time constraints. In Proc. ACM-SIGMOD International Conference on Management of Data, Portland, May-June 1989.Google Scholar
  42. Hyperion Essbase OLAP Server, 1998. URL http://www.hyperion.com/essbaseolap.cfm.Google Scholar
  43. Illustra Information Technologies, Inc. 1994. Illustra User's Guide, Illustra Server Release 2.1.Google Scholar
  44. Informix Corp. 1998a. Sampling: The latest breakthrough in decision support technology. Informix White Paper 000-21681-70.Google Scholar
  45. Informix Corp. 1998b. C-ISAM Version 7.24 for the UNIX Operating System.Google Scholar
  46. Informix Corp. 1998c. Informix Dynamic Server with Universal Data Option 9.1x.Google Scholar
  47. Knuth, D.E. 1973. The Art of Computer Programming: Vol. 3, Sorting and Searching. Addison-Wesley.Google Scholar
  48. Leung, T.Y.C., Pirahesh, H., Seshadri, P., and Hellerstein, J.M. 1998. Query rewrite optimization rules in IBM DB/2 universal database. In Readings in Database Systems, 3rd ed., M. Stonebraker and J.M. Hellerstein (Eds.). San Francisco: Morgan-Kaufmann.Google Scholar
  49. Lipton, R.J., Naughton, J.F., Schneider, D.A., and Seshadri, S. 1993. Efficient sampling strategies for relational database operations. Theoretical Computer Science, 116:195–226.CrossRefMathSciNetGoogle Scholar
  50. Livny, M., Ramakrishnan, R., Beyer, K.S., Chen, G., Donjerkovic, D., Lawande, S., and Myllymaki, J. 1997. DEVise: Integrated querying and visualization of large datasets. In Proc. ACM-SIGMOD International Conference on Management of Data, Tucson, May 1997.Google Scholar
  51. Lynch, C. and Stonebraker, M. 1988. Extended user-defined indexing with application to textual databases. In Proc. 14th International Conference on Very Large Data Bases, Los Angeles, August-Septeber 1998.Google Scholar
  52. Maier, D. and Stein, J. 1986. Indexing in an object-oriented DBMS. In Proc. 1st Workshop on Object-Oriented Database Systems, Asilomar, September 1986.Google Scholar
  53. Morgenstein, J.P. 1980. Computer based management information systems embodying answer accuracy as a user parameter. PhD Thesis, U.C. Berkeley.Google Scholar
  54. O'day, V. and Jeffries, R. 1993. Orienteering in an information landscape: How information seekers get from here to there. In INTERCHI.Google Scholar
  55. Ohno, P. 1998. Visionary. Informix Magazine.Google Scholar
  56. Olken, F. 1993. Random sampling from databases. PhD Thesis, University of California, Berkeley.Google Scholar
  57. Papadopoulos, G. Chief Technology Officer. 1997. Sun Microsystems. Untitled talk. Berkeley NOWRetreat, July 1997.Google Scholar
  58. Perlin, K. and Fox, D. 1993. Pad: An alternative approach to the computer interface. In Proc. ACM SIGGRAPH, Anaheim, pp. 57–64.Google Scholar
  59. Pilot Software 1998. Announces release of PDSS 6.0. URL http://www.pilotsw.com/about/pressrel/pr72998.htm.Google Scholar
  60. Raman, V., Chou, A., and Hellerstein, J.M. 1999a. Scalable spreadsheets for interactive data analysis. In DMKD Workshop.Google Scholar
  61. Raman, V., Raman, B., and Hellerstein, J.M. 1999b. Online dynamic reordering for interactive data processing. In Proc. International Conference on Very Large Data Bases, Edinburgh.Google Scholar
  62. Rao, J. and Ross, K.A. 1998. Reusing invariants: A new strategy for correlated queries. In Proc. ACM-SIGMOD International Conference on Management of Data, Seattle, June 1998.Google Scholar
  63. Red Brick Systems, Inc. 1998. Red brick warehouse. URL http://www.redbrick.com/products/rbw/rbw.html.Google Scholar
  64. Seshadri, P. and Swami, A. 1995. Generalized partial indexes. In Proc. 11th IEEE International Conference on Data Engineering, Taipei, March 1995.Google Scholar
  65. Shneiderman, B. 1982. The future of interactive systems and the emergence of direct manipulation. Behavior and Information Technology, 1(3):237–256.Google Scholar
  66. Shukla, A., Deshpande, P., and Naughton, J.F. 1998. Materialized view selection for multidimensional datasets. In Proc. 24th International Conference on Very Large Data Bases, New York City.Google Scholar
  67. Silberschatz, A., Read, R.L., and Fussell, D.S. 1992. A multi-resolution relational data model. In Proc. 18th International Conference on Very Large Data Bases, Vancouver, August 1992.Google Scholar
  68. QL 1998. Server 7.0 OLAP services. URL http://www.microsoft.com/backoffice/sql/70/whpprs/olapoverview.htm.Google Scholar
  69. Stonebraker, M. 1989. The case for partial indexes. SIGMOD Record, 18(4):4–11.CrossRefGoogle Scholar
  70. Stonebraker, M. and Kemnitz, G. 1991. The POSTGRES Next-Generation database management system. Communications of the ACM, 34(10):78–92.CrossRefGoogle Scholar
  71. Tan, K., Goh, C.H., and Ooi, B.C. 1999. Online feedback for nested aggregate queries with multi-threading. In Prov. International Conference on Very Large Data Bases, Edinburgh.Google Scholar
  72. Transaction Processing Council. TPC-D Rev. 1.2.3 Benchmark Specification.URLhttp://www.tpc.org/dspec.html.Google Scholar
  73. Vrbsky, S.V. and Liu, J.W.S. 1993. APPROXIMATE-A query processor that produces monotonically improving approximate answers. IEEE Transactions on Knowledge and Data Engineering, 5(6):1056–1068.CrossRefGoogle Scholar
  74. Waldspurger, C.A. and Weihl, W.E. 1995. Lottery scheduling: Flexible proportional-share resource management. In First Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
  75. Walter, T., Chief Technical Officer. 1998. NCR parallel systems. Complex queries. NSF Database Systems Industrial/Academic Workshop, October 1998.Google Scholar
  76. Wilschut, A.N. and Apers, P.M.G. 1991. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Info. Sys. (PDIS), pages 68-77, Miami Beach, December 1991.Google Scholar
  77. Winter, R. and Auerbach, K. 1998. The big time: 1998 winter VLDB survey. Database Programming and Design.Google Scholar
  78. Zilberstein, S. and Russell, S.J. 1996. Optimal composition of real-time systems. Artificial Intelligence, 82(1/2):181–213.CrossRefMathSciNetGoogle Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Joseph M. Hellerstein
    • 1
  • Ron Avnur
    • 1
  • Vijayshankar Raman
    • 1
  1. 1.Computer Science DivisionU.C.BerkeleyUSA

Personalised recommendations