Dynamic Workload-Based Partitioning for Large-Scale Databases

  • Miguel Liroz-Gistau
  • Reza Akbarinia
  • Esther Pacitti
  • Fabio Porto
  • Patrick Valduriez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7447)

Abstract

Applications with very large databases, where data items are continuously appended, are becoming more and more common. Thus, the development of efficient workload-based data partitioning is one of the main requirements to offer good performance to most of those applications that have complex access patterns, e.g. scientific applications. However, the existing workload-based approaches, which are executed in a static way, cannot be applied to very large databases. In this paper, we propose DynPart, a dynamic partitioning algorithm for continuously growing databases. DynPart efficiently adapts the data partitioning to the arrival of new data elements by taking into account the affinity of new data with queries and fragments. In contrast to existing static approaches, our approach offers a constant execution time, no matter the size of the database, while obtaining very good partitioning efficiency. We validated our solution through experimentation over real-world data; the results show its effectiveness.

Keywords

Execution Time Data Item Dynamic Algorithm Static Partitioning Dynamic Partitioning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The dark energy survey, http://www.darkenergysurvey.org/
  2. 2.
    Sloan digital sky survey, http://www.sdss3.org
  3. 3.
    Ailamaki, A., Kantere, V., Dash, D.: Managing scientific data. Communications of the ACM 53(6), 68–78 (2009)CrossRefGoogle Scholar
  4. 4.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems 26(2), 1–26 (2008)MATHCrossRefGoogle Scholar
  5. 5.
    Cooper, B.F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.A., Puz, N., Weaver, D., Yerneni, R.: PNUTS: Yahoo!’s hosted data serving platform. Proceedings of the VLDB Endowment 1(2), 1277–1288 (2008)Google Scholar
  6. 6.
    Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment 3(1), 48–57 (2010)Google Scholar
  7. 7.
    Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented DBMS. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 553–564 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Miguel Liroz-Gistau
    • 1
  • Reza Akbarinia
    • 1
  • Esther Pacitti
    • 2
  • Fabio Porto
    • 3
  • Patrick Valduriez
    • 1
  1. 1.INRIA & LIRMMMontpellierFrance
  2. 2.INRIA & LIRMMUniversity Montpellier 2MontpellierFrance
  3. 3.LNCCPetropolisBrazil

Personalised recommendations