Optimal Distributed Declustering Using Replication

  • Keith B. Frikken
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3363)

Abstract

A common technique for improving performance for database query retrieval is to decluster the database among multiple disks so that retrievals can be parallelized. In this paper we focus on answering range queries over a multidimensional database, where each of its dimensions are divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for determining how to place the records on disks to minimize the retrieval time. Recently, the idea of using replication (i.e., placing records on more than one disk) to improve performance has been introduced. When using replication there are two goals: i) to minimize the retrieval time and ii) to minimize the scheduling overhead it takes to determine which disk obtains a specific record when processing a query. The previously known replicated declustering schemes with low retrieval times are randomized; and one of the primary advantages of randomized schemes is that they balance the load evenly among the disks for large queries with high probability. In this paper we introduce a new class of replicated placement schemes called the shift schemes that are: i) deterministic, ii) have retrieval performance that is comparable to the randomized schemes, iii) have a strictly optimal retrieval time for all large queries, and iv) have a more efficient query scheduling algorithm than those for the randomized placements. Furthermore, we display experimental results that suggest that the shift schemes have stronger average performance (in terms of retrieval times) than the randomized schemes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Czumaj, C.R.A., Scheideler, C.: Perfectly Balanced Allocation. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds.) RANDOM 2003 and APPROX 2003. LNCS, vol. 2764, pp. 240–251. Springer, Heidelberg (2003)Google Scholar
  2. 2.
    Abdel-Ghaffar, K., Abbadi, A.E.: Optimal Allocation of Two-dimensional Data. In: International Conference on Database Theory, pp. 409–418 (1997)Google Scholar
  3. 3.
    Aerts, J., Korst, J., Egner, S.: Random Duplicate Storage for Load Balancing in Multimedia Servers. Information Processing Letters 76(1–2), 51–59 (2000)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Atallah, M., Frikken, K.: Replicated Parallel I/O without Additional Scheduling Costs. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 223–232. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Atallah, M.J., Prabhakar, S.: (Almost) Optimal Parallel Block Access to Range Queries. In: Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 205–215. ACM Press, New York (2000)CrossRefGoogle Scholar
  6. 6.
    Bhatia, R., Sinha, R., Chen, C.-M.: Hierarchical Declustering Schemes for Range Queries. In: 7th Int’l Conf. on Extending Database Technology (2000)Google Scholar
  7. 7.
    Bhatia, R., Sinha, R.K., Chen, C.-M.: Declustering using Golden Ratio Sequences. In: ICDE, pp. 271–280 (2000)Google Scholar
  8. 8.
    Chen, C.-M., Cheng, C.T.: From Discrepancy to Declustering: Near-optimal Multidimensional Declustering Strategies for Range Queries. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 29–38. ACM Press, New York (2002)CrossRefGoogle Scholar
  9. 9.
    Chen, C.-M., Cheng, C.T.: Replication and Retrieval Strategies of Multidimensional Data on Parallel Disks. In: Proceedings of the twelfth international conference on Information and knowledge management, pp. 32–39. ACM Press, New York (2003)CrossRefGoogle Scholar
  10. 10.
    Chen, L.T., Rotem, D.: Optimal Response Time Retrieval of Replicated Data (extended abstract). In: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 36–44. ACM Press, New York (1994)CrossRefGoogle Scholar
  11. 11.
    Du, H., Sobolewski, J.: Disk Allocation for Cartesian Product Files on Multiple Disk Systems. ACM Transactions on Database System, 82–101 (1982)Google Scholar
  12. 12.
    Frikken, K., Atallah, M., Prabhakar, S., Safavi-Naini, R.: Optimal Parallel I/O for Range Queries through Replication. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 669–678. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Himatsingka, B., Srivastava, J., Li, J.-Z., Rotem, D.: Latin Hypercubes: A Class of Multidimensional Declustering Techniques (1994)Google Scholar
  14. 14.
    Hsiao, H.-I., DeWitt, D.: A new Availability Strategy for Multiprocessor Database Machines. In: Proceedings of Data Engineering, pp. 456–465 (1990)Google Scholar
  15. 15.
    Kim, M.H., Pramanik, S.: Optimal File Distribution for Partial Match Retrieval. In: Proceedings of the 1988 ACM SIGMOD international conference on Management of data, pp. 173–182. ACM Press, New York (1988)CrossRefGoogle Scholar
  16. 16.
    Matousek, J.: Geometric discrepancy, an illustrated guide. Springer, Heidelberg (1999)MATHGoogle Scholar
  17. 17.
    Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., Abbadi, A.E.: Cyclic Allocation of Two-Dimensional Data. In: 14th International Conference on Data Engineering, pp. 94–101 (1998)Google Scholar
  18. 18.
    Sanders, P.: Reconciling Simplicity and Realism in Parallel Disk Models. In: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pp. 67–76. ACM Press, New York (2001)Google Scholar
  19. 19.
    Sanders, P., Egner, S., Korst, J.: Fast Concurrent Access to Parallel Disks. In: Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pp. 849–858. ACM Press, New York (2000)Google Scholar
  20. 20.
    Sinha, R.K., Bhatia, R., Chen, C.-M.: Asymptotically Optimal Declustering Schemes for Range Queries. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, p. 144. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  21. 21.
    Tosun, A., Ferhatosmanoglu, H.: Optimal Parallel I/O using Replication. Technical Report OSU-CISRC-11/01-TR26 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Keith B. Frikken
    • 1
  1. 1.CERIAS and Department of Computer SciencesPurdue UniversityWest Lafayette

Personalised recommendations