The VLDB Journal

, Volume 23, Issue 6, pp 845–870 | Cite as

SWORD: workload-aware data placement and replica selection for cloud data management systems

  • K. Ashwin Kumar
  • Abdul Quamar
  • Amol Deshpande
  • Samir Khuller
Special Issue Paper


Cloud computing is increasingly being seen as a way to reduce infrastructure costs and add elasticity, and is being used by a wide range of organizations. Cloud data management systems today need to serve a range of different workloads, from analytical read-heavy workloads to transactional (OLTP) workloads. For both the service providers and the users, it is critical to minimize the consumption of resources like CPU, memory, communication bandwidth, and energy, without compromising on service-level agreements if any. In this article, we develop a workload-aware data placement and replication approach, called SWORD, for minimizing resource consumption in such an environment. Specifically, we monitor and model the expected workload as a hypergraph and develop partitioning techniques that minimize the average query span, i.e., the average number of machines involved in the execution of a query or a transaction. We empirically justify the use of query span as the metric to optimize, for both analytical and transactional workloads, and develop a series of replication and data placement algorithms by drawing connections to several well-studied graph theoretic concepts. We introduce a suite of novel techniques to achieve high scalability by reducing the overhead of partitioning and query routing. To deal with workload changes, we propose an incremental repartitioning technique that modifies data placement in small steps without resorting to complete repartitioning. We propose the use of fine-grained quorums defined at the level of groups of data items to control the cost of distributed updates, improve throughput, and adapt to different workloads. We empirically illustrate the benefits of our approach through a comprehensive experimental evaluation for two classes of workloads. For analytical read-only workloads, we show that our techniques result in significant reduction in total resource consumption. For OLTP workloads, we show that our approach improves transaction latencies and overall throughput by minimizing the number of distributed transactions.


Cloud data management Hypergraph partitioning Data placement Replication Resource minimization Scalability 



This work was supported in part by NSF Grant CCF-0937865, an IBM Collaborative Research Award, and an Amazon AWS in Education Research grant.


  1. 1.
    MLPart: a hypergraph partitioning package. URL
  2. 2.
    hMETIS: a hypergraph partitioning package. URL
  3. 3.
    Alon, N., Seymour, P.D., Thomas, R.: A separator theorem for graphs with an excluded minor and its applications. In: STOC (1990)Google Scholar
  4. 4.
    Alpert, C.J.: The ISPD98 circuit benchmark suite. In: Proc. of Intl, Symposium on Physical Design (1998)Google Scholar
  5. 5.
    Asahiro, Y., Iwama, K., Tamaki, H., Tokuyama, T.: Greedily finding a dense subgraph. In: SWAT (1996)Google Scholar
  6. 6.
    Ayka, C., Cambazoglu, B., Bora, U.: Multi-level direct k-way hypergraph partitioning with multiple constraints and fixed vertices. J. Parallel Distrib. Comput. 68, 609–625 (2008)Google Scholar
  7. 7.
    Boppana, R.B.: Eigenvalues and graph bisection: an average-case analysis. In: FOCS (1987)Google Scholar
  8. 8.
    Bruno, N., Chaudhuri, S., Konig, A.C., Narasayya, V.R., Ramamurthy, R., Syamala, M.: Autoadmin project at Microsoft research: lessons learned. IEEE Data Eng. Bull. 34, 12–19 (2011)Google Scholar
  9. 9.
    Caldwell, A.E., Kahng, A.B., Markov, I.L.: Design and implementation of move-based heuristics for VLSI hypergraph partitioning. J. Exp. Algorithmics. 5, 5 (2000)CrossRefGoogle Scholar
  10. 10.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI (2006)Google Scholar
  11. 11.
    Chowdhury, M., Zaharia, M., Ma, J., Jordan, M.I., Stoica., I.: Managing data transfers in computer clusters with orchestra. In: SIGCOMM (2011)Google Scholar
  12. 12.
    Curino, C., Zhang, Y., Jones, E.P.C., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3(1), 48–57 (2010)Google Scholar
  13. 13.
    J. Dittrich, Ruiz, J.A.Q., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). In: PVLDB, vol. 3, pp. 515–529 (2010)Google Scholar
  14. 14.
    Economou, D., Rivoire, S., Kozyrakis, C.: Full-system power analysis and modeling for server environments. In: MOBS (2006)Google Scholar
  15. 15.
    Eltabakh, M.Y., Tian, Y., Ozcan, F., Gemulla, R., Krettek, A., McPherson, J.: Cohadoop: flexible data placement and its exploitation in hadoop. PVLDB 4(9), 575–585 (2011)Google Scholar
  16. 16.
    Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Feige, U., Kortsarz, G., Peleg, D.: The dense k-subgraph problem. Algorithmica 29, 410–421 (1999)Google Scholar
  18. 18.
    Ferhatosmanoglu, H., Tosun, A., Ramachandran, A.: Replicated declustering of spatial data. In: PODS (2004)Google Scholar
  19. 19.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness (1979)Google Scholar
  20. 20.
    Gray, J., Helland, P., O’Neil, P.E., Shasha, D.: The dangers of replication and a solution. In: SIGMOD (1996)Google Scholar
  21. 21.
    Ho, L., Wu, J., Liu, P.: Optimal algorithms for cross-rack communication optimization in mapreduce framework. In: CLOUD (2011)Google Scholar
  22. 22.
    Jones, E.P.C., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD (2010)Google Scholar
  23. 23.
    Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: Proc. of DAC, pp. 343–348 (1998)Google Scholar
  24. 24.
    Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: application in VLSI domain. In: IEEE VLSI, pp. 69–529. (1999)Google Scholar
  25. 25.
    Koyutürk, M., Aykanat, C.: Iterative-improvement-based declustering heuristics for multi-disk databases. Inf. Syst. 30, 47–70 (2005)Google Scholar
  26. 26.
    Kumar, K.A., Quamar, A., Deshpande, A., Khuller, S.: Workload-Aware Data Placement and Replica Selection for Cloud Data Management Systems. University of Maryland Technical Report (2013)Google Scholar
  27. 27.
    Lakshman, A., Malik, P.: Cassandra: structured storage system on a P2P network. In: PODC (2009)Google Scholar
  28. 28.
    Liu, D., Shekhar, S.: Partitioning similarity graphs: a framework for declustering problems. Inf. Syst. 21,475–496 (1996)Google Scholar
  29. 29.
    Meyerhenke, H., Monien, B., Sauerwald, T.: A new diffusion-based multilevel algorithm for computing graph partitions. J. Parallel Distrib. Comput. 69(9), 750–761 (2009)CrossRefGoogle Scholar
  30. 30.
    Nehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: SIGMOD (2011)Google Scholar
  31. 31.
    Neves, T., Drummond, L.A., Ochi, L., Albuquerque, C., Uchoa, E.: Solving replica placement and request distribution in content distribution networks. Electron. Notes Discret. Math. 36, 89–96 (2010)CrossRefGoogle Scholar
  32. 32.
    Oktay, K.Y., Turk, A., Aykanat, C.: Selective replicated declustering for arbitrary queries. In: Euro-Par (2009)Google Scholar
  33. 33.
    Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD (2012)Google Scholar
  34. 34.
    Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD (2009)Google Scholar
  35. 35.
    Peris, J.R., Martinez, M.P., Alonso, G., Kemme, B.: Are quorums an alternative for data replication? TODS 28(3), 257–294 (2003)CrossRefGoogle Scholar
  36. 36.
    Peris, R.J., Martinez, M.P.: How to select a replication protocol according to scalability, availability and communication overhead. In: SRDS (2001)Google Scholar
  37. 37.
    Peris, R.J., Martnez, M.P., Kemme, B., Alonso, G.: How to select a replication protocol according to scalability, availability, and communication overhead. In: SRDS (2001)Google Scholar
  38. 38.
    Quamar, A., Kumar, K.A., Deshpande, A.: Sword: scalable workload-aware data placement for transactional workloads. In: EDBT (2013)Google Scholar
  39. 39.
    Simon, H.D., Teng, S.: How good is recursive bisection? SIAM J. Sci. Comput. 18(5), 1436–1445 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  40. 40.
    Tatarowicz, A.L., Curino, C., Jones, E.P.C., Madden, S.: Lookup tables: fine-grained partitioning for distributed databases. In: ICDE (2011)Google Scholar
  41. 41.
    Thain, D., Livny, M.: Building reliable clients and servers. In: Foster, I., Kesselman, C. (eds.) The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, Burlington (2003)Google Scholar
  42. 42.
    Tosun, A.A., Ferhatosmanoglu, H.: Optimal parallel I/O using replication. In: ICPP (1997)Google Scholar
  43. 43.
    Tosun, A.S.: Replicated declustering for arbitrary queries. In: ACM Symposium on Applied Computing (2004)Google Scholar
  44. 44.
    Wang, X., Smalter, A., Huan, J., Lushington, G.H.: G-hash: towards fast kernel-based similarity search in large graph databases. In: EDBT (2009)Google Scholar
  45. 45.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media, 1st edn. ISBN:0596521979 (2009)Google Scholar
  46. 46.
    Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. TODS 22, 255–314 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • K. Ashwin Kumar
    • 1
  • Abdul Quamar
    • 1
  • Amol Deshpande
    • 1
  • Samir Khuller
    • 1
  1. 1.University of MarylandCollege ParkUSA

Personalised recommendations