Accessing Scientific Data: Simpler is Better

Riedewald, Mirek; Agrawal, Divyakant; El Abbadi, Amr; Korn, Flip

doi:10.1007/978-3-540-45072-6_13

Mirek Riedewald⁸,
Divyakant Agrawal⁹,
Amr El Abbadi⁹ &
…
Flip Korn¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2750))

Included in the following conference series:

International Symposium on Spatial and Temporal Databases

754 Accesses
3 Citations

Abstract

A variety of index structures has been proposed for supporting fast access and summarization of large multidimensional data sets. Some of these indices are fairly involved, hence few are used in practice. In this paper we examine how to reduce the I/O cost by taking full advantage of recent trends in hard disk development which favor reading large chunks of consecutive disk blocks over seeking and searching. We present the Multiresolution File Scan (MFS) approach which is based on a surprisingly simple and flexible data structure which outperforms sophisticated multidimensional indices, even if they are bulk-loaded and hence optimized for query processing. Our approach also has the advantage that it can incorporate a priori knowledge about the query workload. It readily supports summarization using distributive (e.g., count, sum, max, min) and algebraic (e.g., avg) aggregate operators.

This work was supported by NSF grants IIS98-17432, EIA99-86057, EIA00-80134, and IIS02-09112.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 322–331 (1990)
Google Scholar
Berchtold, S., Böhm, C., Kriegel, H.-P.: Improving the query performance of high-dimensional index structures by bulk-load operations. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 216–230. Springer, Heidelberg (1998)
Chapter Google Scholar
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 28–39 (1996)
Google Scholar
Bernstein, P.A., et al.: The Asilomar report on database research. SIGMOD Record 27(4), 74–80 (1998)
Article Google Scholar
Böhm, C., Kriegel, H.-P.: Dynamically optimizing high-dimensional index structures. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 36–50. Springer, Heidelberg (2000)
Chapter Google Scholar
Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing 17(3), 427–462 (1988)
Article MATH MathSciNet Google Scholar
Winter Corporation. Database scalability program (2001), http://www.wintercorp.com
Gaede, V., Günther, O.: Multidimensional access methods. ACM Computing Surveys 30(2), 170–231 (1998)
Article Google Scholar
Ganger, G.R., Worthington, B.L., Patt, Y.N.: The DiskSim Simulation Environment Version 2.0 Reference Manual (1999)
Google Scholar
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 29–53 (1997)
Google Scholar
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 47–57 (1984)
Google Scholar
Hahn, C.J., Warren, S.G., London, J.: Edited synoptic cloud reports fromsh ips and land stations over the globe (1982-1991), http://cdiac.esd.ornl.gov/ftp/ndp026b (1996)
Jagadish, H.V., Lakshmanan, L.V.S., Srivastava, D.: Snakes and sandwiches: Optimal clustering strategies for a data warehouse. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 37–48 (1999)
Google Scholar
Kotidis, Y., Roussopoulos, N.: An alternative storage organization for ROLAP aggregate views based on cubetrees. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 249–258 (1998)
Google Scholar
Lang, C.A., Singh, A.K.: Modeling high-dimensional index structures using sampling. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 389–400 (2001)
Google Scholar
Lazaridis, I., Mehrotra, S.: Progressive approximate aggregate queries with a multi-resolution tree structure. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 401–412 (2001)
Google Scholar
Pagel, B.-U., Korn, F., Faloutsos, C.: Deflating the dimensionality curse using multiple fractal dimensions. In: Proc. Int. Conf. on Data Engineering (ICDE), pp. 589–598 (2000)
Google Scholar
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 486–495 (1997)
Google Scholar
Proietti, G., Faloutsos, C.: I/O complexity for range queries on region data stored using an R-tree. In: Proc. Int. Conf. on Data Engineering (ICDE), pp. 628–635 (1999)
Google Scholar
Riedewald, M., Agrawal, D., El Abbadi, A.: pCube: Update-efficient online aggregation with progressive feedback and error bounds. In: Proc. Int. Conf. on Scientific and Statistical Database Management (SSDBM), pp. 95–108 (2000)
Google Scholar
Riedewald, M., Agrawal, D., El Abbadi, A.: Efficient integration and aggregation of historical information. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 13–24 (2002)
Google Scholar
Roussopoulos, N., Kotidis, Y., Roussopoulos, M.: Cubetree: Organization of and bulk updates on the data cube. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 89–99 (1997)
Google Scholar
Ruemmler, C., Wilkes, J.: An introduction to disk drive modeling. IEEE Computer 27(3), 17–28 (1994)
Google Scholar
Seeger, B.: An analysis of schedules for performing multi-page requests. Information Systems 21(5), 387–407 (1996)
Article MathSciNet Google Scholar
Seeger, B., Larson, P.-A., McFayden, R.: Reading a set of disk pages. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 592–603 (1993)
Google Scholar
Shukla, A., Deshpande, P., Naughton, J.F., Ramasamy, K.: Storage estimation for multidimensional aggregates in the presence of hierarchies. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 522–531 (1996)
Google Scholar
Tao, Y., Papadias, D.: Adaptive index structures. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 418–429 (2002)
Google Scholar
Tao, Y., Papadias, D., Zhang, J.: Cost models for overlapping and multi-version structures. In: Proc. Int. Conf. on Data Engineering (ICDE), pp. 191–200 (2002)
Google Scholar
Theodoridis, Y., Sellis, T.K.: A model for the prediction of R-tree performance. In: Proc. Symp. on Principles of Database Systems (PODS), pp. 161–171 (1996)
Google Scholar
Thompson, D.A., Best, J.S.: The future of magnetic data storage technology. IBM Journal of Research and Development 44(3), 311–322 (2000)
Article Google Scholar
Transaction Processing Performance Council. TPC benchmarks, http://www.tpc.org
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 194–205 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Cornell University, Ithaca, NY
Mirek Riedewald
University of California, Santa Barbara, CA
Divyakant Agrawal & Amr El Abbadi
AT&T Labs-Research, Florham Park, NJ
Flip Korn

Authors

Mirek Riedewald
View author publications
You can also search for this author in PubMed Google Scholar
Divyakant Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Amr El Abbadi
View author publications
You can also search for this author in PubMed Google Scholar
Flip Korn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Academic Computer Technology Institute, Patras, Greece
Thanasis Hadzilacos
Data Engineering Research Lab. Department of Informatics,, Aristotle University, 54124, Thessaloniki, Greece
Yannis Manolopoulos
Flinders University, Adelaide, Australia
John Roddick
Department of Informatics, University of Piraeus, Greece
Yannis Theodoridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Riedewald, M., Agrawal, D., El Abbadi, A., Korn, F. (2003). Accessing Scientific Data: Simpler is Better. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds) Advances in Spatial and Temporal Databases. SSTD 2003. Lecture Notes in Computer Science, vol 2750. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45072-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-45072-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40535-1
Online ISBN: 978-3-540-45072-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics