Abstract
It is desirable to design partitioning methods that minimize the I/O time incurred during query execution in spatial databases. This paper explores optimal partitioning for two-dimensional data for a class of queries and develops multi-disk allocation techniques that maximize the degree of I/O parallelism obtained in each case. We show that hexagonal partitioning has optimal I/O performance for circular queries among all partitioning methods that use convex non-overlapping regions. An analysis and extension of this result to all possible partitioning techniques is also given. For rectangular queries, we show that hexagonal partitioning has overall better I/O performance for a general class of range queries, except for rectilinear queries, in which case rectangular grid partitioning is superior. By using current algorithms for rectangular grid partitioning, parallel storage and retrieval algorithms for hexagonal partitioning can be constructed. Some of these results carry over to circular partitioning of the data—which is an example of a non-convex region.
Similar content being viewed by others
References
M.J. Atallah and S. Prabhakar, "(Almost) optimal parallel block access for range queries," in Proc. ACM Symp. on Principles of Database Systems, Dallas, Texas, May 2000, pp. 205–215.
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, "The R *tree: An efficient and robust access method for points and rectangles," in Proc. ACM SIGMOD Int. Conf. on Management of Data, May 23-25, 1990, pp. 322–331.
S. Berchtold, C. Bohm, B. Braunmuller, D.A. Keim, and H-P. Kriegel, "Fast parallel similarity search in multimedia databases," in Proc. ACM SIGMOD Int. Conf. on Management of Data, Arizona, USA, 1997, pp. 1–12.
S. Berchtold, C. Bohm, D. Keim, and H.-P. Kriegel, "A cost model for nearest neighbor search," in Proc. ACM Symp. on Principles of Database Systems, Tuscon, Arizona, USA, June 1997, pp. 78–86.
S. Berchtold, C. Bohm, and H.-P. Kriegel, "The Pyramid-Technique: Towards breaking the curse of dimen-sionality," in Proc. ACMSIGMOD Int. Conf. on Management of Data, Seattle, Washington, USA, June 1998, pp. 142–153.
R. Bhatia, R.K. Sinha, and C.-M. Chen, "Hierarchical declustering schemes for range queries," in Advances in Database Technology-EDBT 2000, 7th International Conference on Extending Database Technology, Lecture Notes in Computer Science, Konstanz, Germany, March 2000, pp. 525–537.
C.-M. Chen, R. Bhatia, and R. Sinha, "Declustering using golden ratio sequences," in International Conference on Data Engineering, San Diego, California, Feb 2000, pp. 271–280.
C.-M. Chen and C.T. Cheng, "From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries," in Proc. ACM Symp. on Principles of Database Systems, Wisconsin, Madison, 2002, pp. 29–38.
X. Cheng, R. Dolin, M. Neary, S. Prabhakar, K. Ravikanth, D. Wu, D. Agrawal, A. El Abbadi, M. Freeston, A. Singh, T. Smith, and J. Su, "Scalable access within the context of digital libraries," in IEEE Proceedings of the International Conference on Advances in Digital Libraries, ADL, Washington, D.C., 1997, pp. 70–81.
H.C. Du and J.S. Sobolewski, "Disk allocation for cartesian product files on multiple-disk systems," ACM Transactions of Database Systems, vol. 7, no. 1, pp. 82–101, 1982.
O. Egecioglu and H. Ferhatosmanoglu, "Circular data-space partitioning for similarity queries and parallel disk allocation," in Proc. of IASTED International Conference on Parallel and Distributed Computing and Systems, Nov. 1999, pp. 194–200.
C. Faloutsos and P. Bhagwat, "Declustering using fractals," in Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems, San Diego, CA, Jan. 1993, pp. 18–25.
H. Ferhatosmanoglu, D. Agrawal, and A. El Abbadi, "Clustering declustered data for efficient retrieval," in Proc. Conf. on Information and Knowledge Management, Kansas City, Missouri, Nov. 1999, pp. 343–350.
H. Ferhatosmanoglu, D. Agrawal, and A. El Abbadi, "Concentric hyperspaces and disk allocation for fast parallel range searching," in Proc. Int. Conf. Data Engineering, Sydney, Australia, March 1999, pp. 608–615.
H. Ferhatosmanoglu, D. Agrawal, and A. El Abbadi, "Optimal partitioning for efficient I/O in spatial databases," in Proc. of the European Conference on Parallel Computing (Euro-Par), Manchester, UK, Aug. 2001.
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi, "Vector approximation based indexing for non-uniform high dimensional data sets," in Proceedings of the 9th ACM Int. Conf. on Information and Knowledge Management, McLean, Virginia, Nov. 2000, pp. 202–209.
V. Gaede and O. Gunther, "Multidimensional access methods," ACM Computing Surveys, vol. 30, pp. 170–231, 1998.
A. Guttman, "R-trees: A dynamic index structure for spatial searching," in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1984, pp. 47–57.
T.C. Hales, "The honeycomb conjecture. Available at http://xxx.lanl.gov/abs/math.MG/9906042, June 1999.
T.C. Hales, "Historical background on hexagonal honeycomb. http://www.math.lsa.umich.edu/ hales/ countdown/honey/hexagonHistory.html, March 2000.
J. Hellerstein, E. Koutsoupias, and C. Papadimitriou, "On the analysis of indexing schemes," in Proc. ACM Symp. on Principles of Database Systems, Tucson, Arizona, June 1997, pp. 249–256.
K.A. Hua and H.C. Young, "A general multidimensional data allocation method for multicomputer database systems," in Database and Expert System Applications, Toulouse, France, Sept. 1997, pp. 401–409.
M.H. Kim and S. Pramanik, "Optimal file distribution for partial match retrieval," in Proc. ACM SIGMOD Int. Conf. on Management of Data, Chicago, 1988, pp. 173–182.
J. Nievergelt, H. Hinterberger, and K.C. Sevcik, "The grid file: An adaptable, symmetric multikey file struc-ture," ACM Transactions on Database Systems vol. 9, no. 1, pp. 38–71, 1984.
A. Okabe, B. Boots, K. Sugihara, and S. Nok Chiu, Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, Wiley, 2001.
S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. El Abbadi, "Cyclic allocation of two-dimensional data," in International Conference on Data Engineering, Orlando, Florida, Feb. 1998, pp. 94–101.
S. Prabhakar, D. Agrawal, and A. El Abbadi, "Efficient disk allocation for fast similarity searching," in 10th International Symposium on Parallel Algorithms and Architectures, SPAA'98, Puerto Vallarta, Mexico, June 1998, pp. 78–87.
J.T. Robinson, "The kdb-tree: A search structure for large multi-dimensional dynamic indexes," in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1981, pp. 10–18.
H. Samet, The Design and Analysis of Spatial Structures. Addison Wesley Publishing Company, Inc., Mas-sachusetts, 1989.
A.S. Tosun and H. Ferhatosmanoglu, "Optimal parallel I/O using replication," in Proceedings of International Workshops on Parallel Processing (ICPP), Vancouver, Canada, Aug. 2002.
R. Weber, H.-J. Schek, and S. Blott, "A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces," in Proceedings of the Int. Conf. on Very Large Data Bases, New York City, New York, Aug. 1998, pp. 194–205.
D. White and R. Jain, "Similarity indexing with the SS-tree," in Proc. Int. Conf. Data Engineering, 1996, pp. 516–523.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ferhatosmanoğlu, H., Agrawal, D., Eğecioğlu, Ö. et al. Optimal Data-Space Partitioning of Spatial Data for Parallel I/O. Distributed and Parallel Databases 17, 75–101 (2005). https://doi.org/10.1023/B:DAPD.0000045550.56749.75
Issue Date:
DOI: https://doi.org/10.1023/B:DAPD.0000045550.56749.75