Optimal Distributed Declustering Using Replication
A common technique for improving performance for database query retrieval is to decluster the database among multiple disks so that retrievals can be parallelized. In this paper we focus on answering range queries over a multidimensional database, where each of its dimensions are divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for determining how to place the records on disks to minimize the retrieval time. Recently, the idea of using replication (i.e., placing records on more than one disk) to improve performance has been introduced. When using replication there are two goals: i) to minimize the retrieval time and ii) to minimize the scheduling overhead it takes to determine which disk obtains a specific record when processing a query. The previously known replicated declustering schemes with low retrieval times are randomized; and one of the primary advantages of randomized schemes is that they balance the load evenly among the disks for large queries with high probability. In this paper we introduce a new class of replicated placement schemes called the shift schemes that are: i) deterministic, ii) have retrieval performance that is comparable to the randomized schemes, iii) have a strictly optimal retrieval time for all large queries, and iv) have a more efficient query scheduling algorithm than those for the randomized placements. Furthermore, we display experimental results that suggest that the shift schemes have stronger average performance (in terms of retrieval times) than the randomized schemes.
Unable to display preview. Download preview PDF.
- 1.Czumaj, C.R.A., Scheideler, C.: Perfectly Balanced Allocation. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds.) RANDOM 2003 and APPROX 2003. LNCS, vol. 2764, pp. 240–251. Springer, Heidelberg (2003)Google Scholar
- 2.Abdel-Ghaffar, K., Abbadi, A.E.: Optimal Allocation of Two-dimensional Data. In: International Conference on Database Theory, pp. 409–418 (1997)Google Scholar
- 6.Bhatia, R., Sinha, R., Chen, C.-M.: Hierarchical Declustering Schemes for Range Queries. In: 7th Int’l Conf. on Extending Database Technology (2000)Google Scholar
- 7.Bhatia, R., Sinha, R.K., Chen, C.-M.: Declustering using Golden Ratio Sequences. In: ICDE, pp. 271–280 (2000)Google Scholar
- 11.Du, H., Sobolewski, J.: Disk Allocation for Cartesian Product Files on Multiple Disk Systems. ACM Transactions on Database System, 82–101 (1982)Google Scholar
- 13.Himatsingka, B., Srivastava, J., Li, J.-Z., Rotem, D.: Latin Hypercubes: A Class of Multidimensional Declustering Techniques (1994)Google Scholar
- 14.Hsiao, H.-I., DeWitt, D.: A new Availability Strategy for Multiprocessor Database Machines. In: Proceedings of Data Engineering, pp. 456–465 (1990)Google Scholar
- 17.Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Cyclic Allocation of Two-Dimensional Data. In: 14th International Conference on Data Engineering, pp. 94–101 (1998)Google Scholar
- 18.Sanders, P.: Reconciling Simplicity and Realism in Parallel Disk Models. In: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pp. 67–76. ACM Press, New York (2001)Google Scholar
- 19.Sanders, P., Egner, S., Korst, J.: Fast Concurrent Access to Parallel Disks. In: Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pp. 849–858. ACM Press, New York (2000)Google Scholar
- 21.Tosun, A., Ferhatosmanoglu, H.: Optimal Parallel I/O using Replication. Technical Report OSU-CISRC-11/01-TR26 (2001)Google Scholar