Autoscaling tiered cloud storage in Anna

Abstract

In this paper, we describe how we extended a distributed key-value store called Anna into an autoscaling, multi-tier service for the cloud. In its extended form, Anna is designed to overcome the narrow cost–performance limitations typical of current cloud storage systems. We describe three key aspects of Anna’s new design: multi-master selective replication of hot keys, a vertical tiering of storage layers with different cost–performance trade-offs, and horizontal elasticity of each tier to add and remove nodes in response to load dynamics. Anna’s policy engine uses these mechanisms to balance service-level objectives around cost, latency, and fault tolerance. Experimental results explore the behavior of Anna’s mechanisms and policy, exhibiting orders of magnitude efficiency improvements over both commodity cloud KVS services and research systems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    Note that repartitioning overhead is not as high as in Sect. 7.4 because here we are using more machines and only add one new node, as opposed to four in that experiment.

References

  1. 1.

    Abadi, D.: Consistency tradeoffs in modern distributed database system design: Cap is only part of the story. Computer 45(2), 37–42 (2012)

    Article  Google Scholar 

  2. 2.

    Acharya, S., Alonso, R. Franklin, M., Zdonik, S.: Broadcast disks: data management for asymmetric communication environments. In: Mobile Computing, pp. 331–361. Springer (1995)

  3. 3.

    Akamai. https://www.akamai.com

  4. 4.

    Al-Shishtawy, A., Vlassov, V.: Elastman: Elasticity manager for elastic key-value stores in the cloud. In: Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference, CAC ’13, pp. 7:1–7:10. ACM, New York (2013)

  5. 5.

    Alizadeh, M., Greenberg, A., Maltz, D.A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: Data center tcp (dctcp). In: Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM ’10, pp. 63–74. ACM, New York (2010)

  6. 6.

    Amazon Web Services. Amazon dynamodb developer guide (api version 2012-08-10), Aug. 2012. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html. Accessed May 3, (2018)

  7. 7.

    Amur, H., Cipar, J., Gupta, V., Ganger, G.R., Kozuch, M.A., Schwan, K.: Robust and flexible power-proportional storage. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pp. 217–228. ACM, New York (2010)

  8. 8.

    Ananthanarayanan, G., Agarwal, S., Kandula, S., Greenberg, A., Stoica, I., Harlan, D., Harris, E.: Scarlett: Coping with skewed content popularity in mapreduce clusters. In: Proceedings of the Sixth Conference on Computer Systems, EuroSys ’11, pp. 287–300. ACM, New York (2011)

  9. 9.

    Amazon web services. https://aws.amazon.com

  10. 10.

    Microsoft azure cloud computing platform. http://azure.microsoft.com

  11. 11.

    Bailis, P., Davidson, A., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Highly available transactions: Virtues and limitations. PVLDB 7(3), 181–192 (2013)

    Google Scholar 

  12. 12.

    Birman, K., Chockler, G., van Renesse, R.: Toward a cloud computing research agenda. ACM SIGACt News 40(2), 68–80 (2009)

    Article  Google Scholar 

  13. 13.

    Brewer, E.: A certain freedom: Thoughts on the cap theorem. In: Proceedings of the 29th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC ’10, pp. 335–335. ACM, New York (2010)

  14. 14.

    Apache cassandra. http://cassandra.apache.org

  15. 15.

    Chen, L., Qiu, M., Song, J., Xiong, Z., Hassan, H.: E2fs: an elastic storage system for cloud computing. J. Supercomput. 74(3), 1045–1060 (2018)

    Article  Google Scholar 

  16. 16.

    Conway, N., Marczak, W.R., Alvaro, P., Hellerstein, J.M., Maier, D.: Logic and lattices for distributed programming. In: Proceedings of the Third ACM Symposium on Cloud Computing, SoCC ’12, pp. 1:1–1:14. ACM, New York (2012)

  17. 17.

    Copeland, G., Alexander, W., Boughter, E., Keller, T.: Data placement in Bubba. In: ACM SIGMOD Record, volume 17, pp. 99–108. ACM (1988)

  18. 18.

    Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: High availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp. 161–174. San Francisco (2008)

  19. 19.

    DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP ’07, pp. 205–220. ACM, New York (2007)

  20. 20.

    Demers, A., Greene, D., Houser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., Terry, D.: Epidemic algorithms for replicated database maintenance. ACM SIGOPS Oper. Syst. Rev. 22(1), 8–32 (1988)

    Article  Google Scholar 

  21. 21.

    Kubernetes–build, ship, and run any app, anywhere. https://www.docker.com

  22. 22.

    Faleiro, J.M., Abadi, D.J.: Latch-free synchronization in database systems: Silver bullet or fool’s gold? In: Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, CIDR ’17 (2017)

  23. 23.

    Firecracker. https://firecracker-microvm.github.io

  24. 24.

    Google cloud platform. https://cloud.google.com

  25. 25.

    Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. CIDR 11, 261–272 (2011)

    Google Scholar 

  26. 26.

    Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX annual technical conference, volume 8. Boston, USA (2010)

  27. 27.

    Kakoulli, E., Herodotou, H.: Octopusfs: A distributed file system with tiered storage management. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17, pp. 65–78. ACM, New York (2017)

  28. 28.

    Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: onsistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC ’97, pp. 654–663. ACM, New York (1997)

  29. 29.

    Khandelwal, A., Agarwal, R., Stoica, I.: Blowfish: Dynamic storage-performance tradeoff in data stores. In: 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pp. 485–500. USENIX Association, Santa Clara (2016)

  30. 30.

    Kubernetes: Production-grade container orchestration. http://kubernetes.io

  31. 31.

    Kubernetes. Set up high-availability kubernetes masters. https://kubernetes.io/docs/tasks/administer-cluster/highly-available-master/. Accessed May 3, (2018)

  32. 32.

    Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J.M., Ramasamy, K., Taneja, S.: Twitter heron: Stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD ’15, pp. 239–250. ACM, New York (2015)

  33. 33.

    Lagar-Cavilla, H.A., Whitney, J.A., Scannell, A.M., Patchin, P., Rumble, S.M., De Lara, E. Brudno, M., Satyanarayanan, M.: Snowflock: rapid virtual machine cloning for cloud computing. In: Proceedings of the 4th ACM European conference on Computer systems, pp. 1–12. ACM (2009)

  34. 34.

    Lamport. L.: The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2), (1998)

  35. 35.

    Larsen, K.G., Nelson, J., Nguyen, H.L., Thorup, M.: Heavy hitters via cluster-preserving clustering. CoRR, arXiv:1604.01357, (2016)

  36. 36.

    Li, H., Ghodsi, A., Zaharia, M., Shenker, S., Stoica, I.: Tachyon: Reliable, memory speed storage for cluster computing frameworks. In: Proceedings of the ACM Symposium on Cloud Computing, SOCC ’14, pp. 6:1–6:15. ACM, New York (2014)

  37. 37.

    Lomet, D., Salzberg, B.: Access methods for multiversion data. SIGMOD Rec. 18(2), 315–324 (1989)

    Article  Google Scholar 

  38. 38.

    Ma, L., Van Aken, D., Hefny, A., Mezerhane, G., Pavlo, A., Gordon, G.J.: Query-based workload forecasting for self-driving database management systems. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, pp. 631–645 (2018)

  39. 39.

    Manjhi, A., Nath, S., Gibbons, P.B.: Tributaries and deltas: Efficient and robust aggregation in sensor network streams. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD ’05, pp. 287–298. ACM, New York (2005)

  40. 40.

    Mao, Y., Kohler, E., Morris, R.T.: Cache craftiness for fast multicore key-value storage. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp. 183–196. ACM (2012)

  41. 41.

    Microsoft Corp. Delivering a great startup and shutdown experience, May 2017. https://docs.microsoft.com/en-us/windows-hardware/test/weg/delivering-a-great-startup-and-shutdown-experience. Accessed May 3 (2018)

  42. 42.

    Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., Menon, P., Mowry, T., Perron, M., Quah, I., Santurkar, S., Tomasic, A., Toor, S., Aken, D.V., Wang, Z., Wu, Y., Xian, R., Zhang, T.: Self-driving database management systems. In: CIDR 2017, Conference on Innovative Data Systems Research (2017)

  43. 43.

    Quamar, A., Kumar, K.A., Deshpande, A.: Sword: Scalable workload-aware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology, EDBT ’13, pp. 430–441. Association for Computing Machinery, New York (2013)

  44. 44.

    Rao, A., Lakshminarayanan, K., Surana, S., Karp, R., Stoica, I.: Load balancing in structured p2p systems. In: International Workshop on Peer-to-Peer Systems, pp. 68–79. Springer (2003)

  45. 45.

    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network, vol. 31. ACM, New York (2001)

    MATH  Google Scholar 

  46. 46.

    Ross, A., Hilton, A., Rensin, D.: Slos, slis, slas, oh my - cre life lessons, january 2017. https://cloudplatform.googleblog.com/2017/01/availability-part-deux--CRE-life-lessons.html. Accessed May 3, (2018)

  47. 47.

    Roy, N., Dubey, A., Gokhale, A.: Efficient autoscaling in the cloud using predictive models for workload forecasting. In: Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, CLOUD ’11, pp. 500–507. IEEE Computer Society, Washington (2011)

  48. 48.

    Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-free replicated data types. In: Défago, X., Petit, F., Villain, V. editors, Stabilization, Safety, and Security of Distributed Systems, pp. 386–400. Springer, Berlin (2011)

  49. 49.

    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST ’10, pp. 1–10. IEEE Computer Society, Washington (2010)

  50. 50.

    Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’01, pp. 149–160. ACM, New York (2001)

  51. 51.

    Stonebraker, M.: The design of the postgres storage system. In: Proceedings of the 13th International Conference on Very Large Data Bases, VLDB ’87, pp. 289–300. Morgan Kaufmann Publishers Inc., San Francisco (1987)

  52. 52.

    Storm. https://github.com/apache/storm

  53. 53.

    Swarmify. https://swarmify.com

  54. 54.

    Thereska, E., Donnelly, A., Narayanan, D.: Sierra: Practical power-proportionality for data center storage. In: Proceedings of the Sixth Conference on Computer Systems, EuroSys ’11, pp. 169–182. ACM, New York (2011)

  55. 55.

    Van Aken, D., Pavlo, A., Gordon, G.J., Zhang, B.: Automatic database management system tuning through large-scale machine learning. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1009–1024. ACM (2017)

  56. 56.

    Vo, H.T., Chen, C., Ooi, B.C.: Towards elastic transactional cloud storage with range query support. PVLDB 3(1–2), 506–514 (2010)

    Google Scholar 

  57. 57.

    Waas, F.M.: Beyond conventional data warehousing - massively parallel data processing with greenplum database–(invited talk). In: Dayal, U., Castellanos, M., Sellis, T. editors, Business Intelligence for the Real-Time Enterprise–Second International Workshop, BIRTE 2008, Auckland, New Zealand, August 24, 2008, Revised Selected Papers, pp. 89–96 (2008)

  58. 58.

    Wilkes, J., Golding, R., Staelin, C., Sullivan, T.: The hp autoraid hierarchical storage system. ACM Trans. Comput. Syst. 14(1), 108–136 (1996)

    Article  Google Scholar 

  59. 59.

    Wood, T., Shenoy, P.J., Venkataramani, A., Yousif, M.S., et al.: Black-box and gray-box strategies for virtual machine migration. NSDI 7, 17–17 (2007)

    Google Scholar 

  60. 60.

    Wu, C., Faleiro, J.M., Lin, Y., Hellerstein, J.M.: Anna: A kvs for any scale. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE) (2018)

  61. 61.

    Xu, L., Cipar, J., Krevat, E., Tumanov, A., Gupta, N., Kozuch, M.A., Ganger, G.R.: Springfs: Bridging agility and performance in elastic distributed storage. In: Proceedings of the 12th USENIX FAST, pp. 243–255. USENIX (2014)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Chenggang Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

We include pseudocode for the algorithms described in Sect. 5 here. Note that some algorithms included here rely on a latency objective, which may or may not be specified. When no latency objective is specified, Anna aspires to its unsaturated request latency (2.5 ms) to provide the best possible performance but caps spending at the specified budget.

figurea
figureb
figurec
figured
figuree

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wu, C., Sreekanti, V. & Hellerstein, J.M. Autoscaling tiered cloud storage in Anna. The VLDB Journal 30, 25–43 (2021). https://doi.org/10.1007/s00778-020-00632-7

Download citation

Keywords

  • Autoscaling
  • Key-value store
  • Cloud storage system
  • Data replication
  • Cost efficiency