Optimizing Scientific Databases for Client Side Data Processing

  • Etzard Stolte
  • Gustavo Alonso
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2287)

Abstract

Databases are nowadays one more building block in complex multi-tier architectures. In general, however, they are still designed and optimized with little regard for the applications that will run on top of them. This problem is particularly acute in scientific applications where the data is usually processed at the client and, hence, conventional server side optimizations are of limited help. In this paper we present a variety of techniques and a novel client/server architecture designed to optimize the client side processing of scientific data. The main building block in our approach is to store frequently accessed data as relatively small, wavelet encoded segments. These segments can be processed at different qualities and resolutions, thereby enabling efficient processing of very large data volumes. Experimental results demonstrate that our approach significantly reduces overhead (I/O, transfer across network, decoding and analysis), does not require changes to the analysis routines and provides all possible resolution ranges.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AD97.
    S. Abiteboul, O. M. Duschka Complexity of Answering Queries using Materialzied Views Technical report Stanford University, 1997Google Scholar
  2. AS98.
    M. J. Aschwanden, B. Kliem, U. Schwarz, J. Kurths, B. R. Dennis, R. A. Schwartz Wavelet Analysis of Solar Flare Hard X-Rays Astrophysical Journal, 505, 941–956,1998CrossRefGoogle Scholar
  3. AU98.
    C. Aurrecoechea, A. Campell, L. Hauw A Survey of QoS architectures Multimedia Systems, pp. 138–151, Juni 1998Google Scholar
  4. BE93.
    P. Bendjoya, J. M. Petit, F. Spahn Wavelet Analysis of the Voyager Data on Planetary Rings ICARUS, Vol 105, pp. 385–299, 1993CrossRefGoogle Scholar
  5. BSG00.
    T. Barclay, D. R. Slutz, J. Gray TerraServer: A Spatial Data Warehouse Proc. of the ACM Conference on Management of Data (SIGMOD), 2000Google Scholar
  6. BRS99.
    F. Buccafurri, D. Rosaci, D. Sacca Compressed Datacubes for Fast OLAP Applications First International Conference on Data Warehousing and Knowledge Discovery (DaWaK), pp. 65–77, 1999Google Scholar
  7. BU99.
    R. Buyya (Ed.) High Performance Cluster Computing, Vol. 1 and 2, Prentice Hall, 1999Google Scholar
  8. CD97.
    S. Chaudhuri, U. Dayal An Overview of Data Warehousing and OLAP Technology ACM SIGMOD Record, 26(1), March 1997 Google Scholar
  9. CGR00.
    K. Chakrabarti, M. Garofalakis, R. Rastogi, K. Shim Approximate Query Processing Using Wavelets Proc. of the VLDB Conference, Cairo, Egypt, pp. 111–120, 2000Google Scholar
  10. CKP95.
    S. Chaudhuri, R. Krishnamurthy, S. Potamianos, K. Shim Optimizing Queries with Materialized Views ICDE, pp. 190–200, 1995Google Scholar
  11. CNS99.
    S. Cohen, W. Nutt, A. Serenbrenik Algorithms for Rewriting Aggregate Queries Using Views Proc. of the International Workshop on Design and Management of Data Warehouses (DMDW), pp. 9.1–9.12, 1999Google Scholar
  12. DG97.
    O. M. Duschka, M. R. Genesereth Answering Recursive Queries using Views Proc. of the PODS Conference, pp. 109–116, 1997Google Scholar
  13. DHK97.
    Jochen Doppelhammer, Thomas Höppler, Alfons Kemper, Donald Kossmann Database Performance in the Real World-TPC-D and SAP R/3 Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USAGoogle Scholar
  14. GM98.
    P. B. Gibbons, Y. Matias New Sampling-Based Summary Statistics for Improving Approximate Query Answers Proc. of the Conference on Managment of Data (SIGMOD), Seattle, USA, pp. 331–342, June 1998Google Scholar
  15. GTKD00.
    D. Gunopulos, V. N. Tsotras, G. Kollios, C. Domeniconi Approximating multi-dimensional aggregate range queries over real attributes Proc. of the Conference on Management of Data (SIGMOD), Dallas, USA, May 2000Google Scholar
  16. HHW97.
    J. M. Hellerstein, P. J. Haas, H. J. Wang Online Aggregation Proc. of the Conference on Management of Data (SIGMOD), Tucson, USA, May 1997Google Scholar
  17. HMS00.
    W. Hoschek, J. J. Martinez, A. S. Samar, H. Stockinger, K. Stockinger Data Management in an International Data Grid Project ACM Workshop on Grid Computing (GRID-00), Bangalore, India, 17–20 Dec., 2000Google Scholar
  18. IPO99.
    Y. E. Ioannidis, V. Poosala Histrogram-Based Approximation of Set-Valued Query Answers Proc. of the VLDB Conference, Edinburgh, Great Britain, September 1999Google Scholar
  19. JS94.
    B. Jawerth, W. Sweldens An Overview of Wavelet-based Multiresolution Analyses SIAM Review, 36(3), pp. 377–412, 1994MATHCrossRefMathSciNetGoogle Scholar
  20. KSD98.
    G. Kaestle, E. C. Shek, S. K. Dao Sharing Experiences from Scientific Experiments Proc. of the International Conference on Scientific and Statistical Database Management, 1998Google Scholar
  21. LMS95.
    A. Y. Levy, A. O. Mendelzon, D. Srivastava, Y. Sagiv Answering Queries Using Views Proc. of the PODS Conference, 1995Google Scholar
  22. MVW00.
    Y. Matias, J. S. Vitter, M. Wang Dynamic Maintenance of Wavelet-Based Histograms Proc. of the VLDB Conference, Cairo, Egypt, pp. 101–110, 2000Google Scholar
  23. OE97.
    B. Oezden, R. Rastogi, A. Silverschatz Multimedia Support for Databases Proc. of the PODS Conference, 1997Google Scholar
  24. RAA01.
    M. Riedewald, D. Agrawal, A. E. Abbadi Flexible Data Cubes for Online Aggregation Proc. of the Int. Conference on Database Theory, pp. 159–173, 2001Google Scholar
  25. S99.
    G. Stoesser et. al. The EMBL Nucleotide Sequence Database Nuclear Acids Research, 27(1), 18–24. 1999CrossRefGoogle Scholar
  26. SFB99.
    J. Shanmugasundaram, U. Fayyad, P. S. Bradley Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions KDD, Dan Diego, USA, pp. 223–231, 1999Google Scholar
  27. SH99.
    F. Sheikholeslami, S. Chatterjee, A. Zhang WaveCluster: a wavelet-based clustering approach for spatial data in very large Databases The VLDB Journal, Vol. 8 No 3–4, pp. 289–304, 2000CrossRefGoogle Scholar
  28. SKT00.
    A. Szalay, P. Z. Kunszt, A. Thakar, J. Gray, and D. R. Slutz Designing and mining multi-terabyte astronomy archives: The sloan digital sky survey Proc. of the Conference on Management of Data (SIGMOD), Dallas, USA, pp. 451–462, May 16–18, 2000Google Scholar
  29. VI98.
    J. S. Vitter, M. Wang, B. Iyer Data Cube Approximation and Histograms via Wavelets Proc. of the CIKM, Bethesda, USA, 1998Google Scholar
  30. VW01.
    J. S. Vitter, M. Wang Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets Proc. of the Conference on Management of Data (SIGMOD), Philadelphia, USA, June 1999Google Scholar
  31. WW98.
    J. Z. Wang, G. Wiederhold, O. Firschein, S. X. Wei Content-based image indexing and searching using Daubechies wavelets International Jounal on Digital Libraries, Volume 1, Issue 4, pp. 311–328, 1998CrossRefGoogle Scholar
  32. WZS95.
    J. T. Wang, K. Zhang, D. Shasha Pattern Matching and Pattern Discovery in Scientific, Program, and Document Databases. Proc. of the Conference on Management of Data (SIGMOD), 1995Google Scholar
  33. ZI94.
    M. Zemankova, Y. E. Ioannidis Scientific Databases-State of the Art and Future Directions. Proc. of the VLDB Conference, Santiago, Chile, 1994Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Etzard Stolte
    • 1
  • Gustavo Alonso
    • 1
  1. 1.Dept. of Computer ScienceSwiss Federal Institute of Technology (ETH) ETH ZentrumZürichSwitzerland

Personalised recommendations