Algorithms for the Database Layout Problem
We present a formal analysis of the database layout problem, i.e., the problem of determining how database objects such as tables and indexes are assigned to disk drives. Optimizing this layout has a direct impact on the I/O performance of the entire system. The traditional approach of striping each object across all available disk drives is aimed at optimizing I/O parallelism; however, it is suboptimal when queries co-access two or more database objects, e.g., during a merge join of two tables, due to the increase in random disk seeks. We adopt an existing model, which takes into account both the benefit of I/O parallelism and the overhead due to random disk accesses, in the context of a query workload which includes co-access of database objects. The resulting optimization problem is intractable in general and we employ techniques from approximation algorithms to present provable performance guarantees. We show that while optimally exploiting I/O parallelism alone suggests uniformly striping data objects (even for heterogeneous files and disks), optimizing random disk access alone would assign each data object to a single disk drive. This confirms the intuition that the two effects are in tension with each other. We provide approximation algorithms in an attempt to optimize the trade-off between the two effects. We show that our algorithm achieves the best possible approximation ratio.
Unable to display preview. Download preview PDF.
- 1.Agrawal, S., Chaudhuri, S., Das, A., Narasayya, V.: Automating Layout of Relational Databases. In: Proceedings of 19th International Conference on Data Engineering, pp. 607–618 (2003)Google Scholar
- 2.The AutoAdmin Project, research.microsoft.com/dmx/AutoAdmin
- 3.Copeland, G., Alexander, W., Boughter, E., Keller, T.: Data Placement in Bubba. In: Proceedings of SIGMOD Conference, pp. 99–108 (1988)Google Scholar
- 4.Dewan, H., Hernandez, M., Mok, K., Stolfo, S.: Predictive Dynamic Load Balancing of Parallel Hash-Joins Over Heterogeneous Processors in the Presence of Data Skew. In: Proceedings of PDIS, pp. 40–49 (1994)Google Scholar
- 5.Garg, N., Vazirani, V.V., Yannakakis, M.: Multiway cuts in directed and node weighted graphs. In: Proceedings of 21st International Colloquium on Automata, Languages and Programming, pp. 487–498 (1994)Google Scholar
- 6.Kann, V., Khanna, S., Lagergren, J., Panconesi, A.: On the Hardness of Approximating Max k-Cut and its dual. Chicago Journal of Theoretical Computer Science (1997)Google Scholar
- 7.Lee, M., Kitsuregawa, M., Ooi, B., Tan, K., Mondal, A.: Towards Self-Tuning Data Placement in Parallel Database Systems. In: Proceedings of SIGMOD Conference, pp. 225–236 (2000)Google Scholar
- 8.Lee, L., Scheuermann, P., Vingralek, R.: File Assignment in Parallel I/O Systems with Minimum Variance and Service Time. IEEE Transactions on Computers (1998)Google Scholar
- 9.Petrank, E.: The Hardness Of Approximation: Gap Location. In: Israel Symposium on Theory of Computing Systems, pp. 275–284 (1993)Google Scholar